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Preface 


This volume, Computer Vision and Recognition Systems: Research Inno- 
vations and Trends, is the contribution of authors from Thailand, Spain, 
Japan, Turkey, Australia, and India. The focus of the volume is based on 
essential modules for comprehending all artificial intelligence experiences 
to provide machines with the power of vision. To imitate human sight, the 
computer vision needs to obtain, store, interpret, and understand images. 

Despite its incredible growth in neural networks, machine learning, 
and deep learning, surprisingly very few books are available on these 
aspects of the topics in the form of research contributions of computer 
vision and recognition systems. The main objectives of this book are to 
provide innovative research developments, applications, and current 
trends in computer vision and recognition systems. 

We are thankful to our contributors for quality submissions based on 
various research works such as visual quality improvement, Parkinson’s 
disease diagnosis, hypertensive retinopathy detection through retinal 
fundus, big image data processing, N-grams for image classification, 
medical brain images, chatbot application, credit score improvisation, 
vision-based lane vehicle detection, damaged vehicle parts recognition, 
partial image encryption of medical images, and image synthesis. 

The content is presented in chapter format, and the organization of 
these chapters formed this book to provide support in various areas where 
computer vision is being used. Computer vision helps computers to 
perceive the images and to label them. The subject area include different 
approaches to computer vision, image processing, and frameworks for 
machine learning to build automated and stable applications. Deep learning 
is also included for making immersive application-based chapters, pattern 
recognition, and biometric systems. 
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CHAPTER 1 


Visual Quality Improvement Using 
Single Image Defogging Technique 


PRITAM VERMA and VIJAY KUMAR’ 


Department of Computer Science and Engg., National Institute of 
Technology, Hamirpur, Himachal Pradesh, India 


“Corresponding author. E-mail: vijaykumarchahar@gmail.com 


ABSTRACT 


In the wintry weather period, haze is the prime confront during driving. It 
eliminates the visibility of an image. Fog removal techniques are required to 
improve the visibility level of the image. In this chapter, a hybrid approach 
is implemented for fog removal. The expected approach utilizes the basic 
concepts of Dark Channel Prior and Bright Channel Prior. Apart from this, 
order statistic filter would use to refine the transmission map. The bright 
channel prior to boundary constraints would use to restore the edges. The 
proposed technique has been compared with existing techniques over 
a set of well-known foggy images. The proposed approach outperforms 
the predefined techniques in terms of average gradient and percentage of 
saturated pixels. 


1.1 INTRODUCTION 


Additional climate-related incidents are happened due to fog. In the year 
2016, around 9000 peoples were died due to intense fog. The visual 
quality of images? is ruined due to the being there of dust, smoke, etc. 
Differences between fog, haze, and rain are described in Table 1.1.!° The 
core cause for fog in the environment due to water droplets suspension.!* 
The water droplets are the reason for consumption and dispersion. When 
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the light comes toward the camera or the viewer is incapacitated due to 
scattering through droplets and distort the visual quality of the image.**’ To 
conquer this problem, some sophisticated systems have been developing 
to maximize visibility during restraining the strong and dazzling light for 
oncoming vehicles.'* For the recognition of fog the motor vehicle detection 
system was developed®!! but the main tribulations would have occurred 
that could not be able to remove the sky visibility. The automatic fog 
detection could detect only daytime fog but it would not able to detect 
the nighttime fog. To conquer this problem, computer vision techniques 
have been started to use.'':'* These techniques also helped to cut down the 
operating cost and accommodated a better visual system.'°”> He et al.'® 
planned a Dark Channel Prior (DCP) that would have utilized image pixels 
with low-intensity value in at least one of the color channels. Nevertheless, 
this value could be lessened in contrast due to additive air light. DCP 
commonly use to evaluate the transmission map and atmosphere shroud.?”° 


TABLE 1.1 Weather Conditions and the Corresponding Particle Size. 


Condition Type Radius (in wm) Concentration (cm™) 
Fog Water droplet 1-10 100-10 

Haze Aerosol 107-1 10°-10 

Rain Water droplet 107-104 10-10% 

Cloud Water droplet 1-10 300-10 


Fattal'® described the local color line prior to re-establish hazy images. 
Nandal and Kumar (2018) proposed a novel image defogged model that 
would use fractional-order anisotropic diffusion. They would have used 
the air light map that would have been evaluated from the hazy model as 
the picture in the anisotropic dissemination development. However, it went 
through halo artifacts. To reduce this problem,'? implemented a technology 
that would use improved DCP and contrast adaptive histogram equalization 
that would able to remove the halo artifact with a new median operator in 
the DCP. They would use a guided filter for the alteration of the transmission 
map. Contrast Limited Adaptive Histogram Equalization (CLAHE) would 
use for further visibility improvement but the complexity of computational 
was so high. To cut down the complexity problem,” integrated DCP and 
Bright channel prior (BCP) would have been developed. They would use 
BCP to solve the sky-region problem that would relate with DCP-based 


Visual Quality Improvement Using Single Image 3 


dehazing.° They would use gain intervention filter to increase the computation 
speed and improve edge preservation. In spite of this, this technique would 
not able to provide the optimum solution for degraded images. To reduce 
the above-mentioned problem, the hybrid algorithm is implemented that 
integrates the DCP and BCP. The proposed approach uses a 2D order statistic 
filter to illuminate the transmission map. BCP with boundary constraints is 
being used to restore the edges. This technique is being compared with the 
existing techniques over a set of well-known foggy images. The leftover 
configuration of this section is as follows. Section 1.2 briefly describes the 
degradation model. The proposed defogging techniques are mentioned in 
Section 1.3. Experimental fallout and planning are given in Section 1.4. The 
concluding observations are given in Section 1.5. 


1.2 DEGRADATION MODELS 


Mathematical model of a fog image is represented as: 
obl (x) = hfl (xe? + Air(l-e 4") (1.1) 


here, image coordinates are denoted by x. Observed hazy image is denoted 
by obI, Haze free image is represented by hfl, Air is the global light, the 
scattering coefficient is denoted by 0, and sense depth is d. The transmis- 
sion is represented as””: 


ranj=e"™ (1.2) 


In the clear weather condition, 0 ~ 0. However, 0 becomes non-negligible 
for foggy images. First term from the eq.1.1, obI(x)hfl(x) decreases when 
the depth scene increases and second term, Air(1-tra[x]) increases when 
the depth scene increases. The main aim of fog removal from an image to 
recover the hfl from ob]. Air and tra can be estimated from obI. hfl can 
be obtained as'®: 


obI (x) — Air 
=—+4+____ +A 
tra(x) 


hfl(x) (1.3) 


1.2.1. DARK CHANNEL PRIOR 


According to the DCP, an RGB image has at least one color channel that 
have some pixels of lowest intensities that tends to zero. For examples, 
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an image of mountains, stones, tree, some brighter objects, etc. In case 
of some images of mountains, stones will have lowest intensities as 
compared to brighter objects and sky region of the sky. The dark channel 
suggests that an RGB image have at least one color channel which has 
lowest intensities that are almost tends to zero. Dark Channel mathemati- 
cally represented as'®: 

Aft (x)= min min hft(y) (4) 

mnek(m.n) — ce(R,G,B) 

where Aff* denotes the intensity of the color channel ce(R,G,B) of the 
RGB image and and 1(x) is a local patch centered at pixel. The minimum 
value among the three-color channels and all pixels are considered as the 
dark channel /fI*"*. The dark channel pixel value can be approximated as 
follow'*: 


Le ~0 (1.5) 


The dark channel is known as DCP when the approximation is zero for the 
pixel values. Another part of this, the dark channel for the foggy images 
produces the pixels that have values greater than zero. Global atmosphere 
light heads to be achromatic and bright. A combination of air light and 
direct depletion significant increases the minimum value of the three 
colors in the local patch. This signifies that the pixel values of the dark 
channel can play a particular rule to estimate the fog density. 


1.2.2 DCP-BASED IMAGE DEFOGGING 


In DCP-based Image Defogging algorithm, the dark channel formulated 
from the input image (see eq 1.4). The atmosphere and transmission 
map is achieved from the dark channel. The transmission map is further 
refined and fog free image is reformulated using eq 1.3. The degradation 
mathematically represented as°: 


obl (x)= hft (x) eM) + Air(1-e 4) (1.6) 


To get minimum intensity in the local patch of each color is done by 
dividing both side of eq. 1.6 by Air’ as follow: 


mil (x) 


Air® ir 


= tra(x) min os 1-ira(x) (1.7) 
A 
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Then, the min operator of the three color channel is applied to eq. 1.7 


as follow: 
min Gee ®) } = tra(x) mimi ME } +(1-tra(x)) (1.8) 
Air® Air® 
tra(x) can be evaluated as 
Tr (x) = 1 min in( 20D ) (1.9) 
Air® 


The dark channel pixels value is highly associated with fog density. There- 
fore, the 0.1% of the brightest pixels in the dark will be selected and the 
color with highest intensity value among the choose pixels have been used 
as the value for Air. For the sky region DCP is not reliable. If the color 
of the sky is close to Air in hazy image then, min(min(obI(x)Air‘)) will 
approx. to | and tra(x)° will be 0. Haze free image can be mathematically 
represented for given Air, tra(x)° and obI(x), as'*: 


io-| i a (1.10) 


max(tra(x), Do 


here, the lower bound for transmission is denoted by p, 


1.2.3 BRIGHT CHANNEL PRIOR 


In case of stumpy illumination color images, image enrichment technology 
is frequently used. A large amount of the brightness augmentation algorithm 
depended on the BCP that center of attention is the gray removal. The 
local patches in elucidation images are full of some pixels that have very 
high intensities in at least one color channel. The construction of a fog 
picture is definite as: 


obl (x) = hft(xje + Air(1-e (1.11) 


where x represents the image coordinates, ob/ is the observed hazy image, hfT 
is the haze-free image, Air is the global atmospheric light, 6 is the scattering 
coefficient of the atmosphere, and d is the scene depth. The transmission 
map is defined as: 


tra(x) =e") (1.12) 
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The deformation is given as: 


[  ragsy{ AE) += tracsy (1.13) 


where ce {r,g,b} is the color channel index. We calculate the bright 
channel on both sides of eq. 1.13. The maximum operators are applied on 
both sides. 

obI 


mas{ ma we) = tra(x) mas{ ma + 1-tra(x) (1.14) 


Ai yer ire 


where Afl° is a color channel of hff and X(x) is a local patch centered at x. 
We assume that the patch’s transmission 1s ¢ra(x). The goal of this model 
is to recover Aff, Air, and tra from ob/. The low illumination images is 
defined as'’: 


nf?” (x) = max (max hfI° (y)) (1.15) 
mneKmn) —— e(R,G,B) 
Tes (x) = = afre (x) el : 16) 


here, Aff’"s" (x) is the bright channel. 


1.3. PROPOSED DEFOGGING ALGORITHM 


The proposed defogging algorithm is inspired from the work done by 
Singh and Kumar. The improvements in the work proposed by Singh and 
Kumar are as follows: 


a) Double BCP (DBCP) is used instead of single BCP. The reason 
behind is to solve the sky-region problem. One BCP is computed 
by utilizing the boundary constraints and another BCP is computed 
by utilizing the pad image. 

b) 2D order statistic filter is used to preserve the edge information 
and allow defogging in smooth area. 

c) DCP with boundary constraints is used. The atmosphere light 
and integrated transmission map is estimated by using DCP with 
boundary constraints and DBCP. 
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1.3.1 BOUNDARY CONSTRAINTS 


It is a lower and upper bound limit of the solution x. By the help of this, 
faster and reliable solutions can be generated by holding the upper and 
lower bounds limit. Let’s consider that bounds are vector with the same 
length as x. 


¢ Ifno lower bound for any component then use -Inf as the bound and 
use Inf for no upper bound. 

¢ If either have upper or lower bound, then don’t need to write the 
other type. For example, if have no upper bounds then do not need 
to supply the other vector of Infs. 

¢ Out of n component, if the first m have bounds then they have to 
supply a vector of length m containing bounds. 


For example, their boundaries are x > 7 and <3. The constraint vectors 
can be /b = lower-bound= [-Inf;-Inf;7] and upper-bound = [Inf;3] (will 
give a warning) or upper bound = [Inf;3;Inf]. 


1.3.2 DCP WITH BOUNDARY CONSTRAINTS 


This is used to eliminate the fog from the foggy image. DCP uses patch 
wise transmission form boundary constraints. It uses the hazy image, air 
light, and check pixel-wise boundary for each colour (RGB) and uses the 
max filter on concentration for the result set of the RGB. 


1.3.3 BCP WITH BOUNDARY CONSTRAINTS 

Basically, it is used to eliminate the sky-region problem from the foggy 
image. BCP uses patch-wise transmission form boundary constraints. It 
uses the hazy image, air light, and check pixel-wise boundary for each color 
(RGB) and uses the max filter on concentration for the result set of the RGB. 


1.3.4 BCP WITH PAD IMAGE 


Pad-size (array A) is a vector of no-negative integers that determines 
both the padding amount and the dimension along which is to add it. The 
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amount of padding is determined the value of an element in the vector. 
The dimension along which to add the padding is determined the order 
of an element in vector specifies. By using pad array, it finds the pad-size 
and pad image. BCP uses the both hazy image and frame size. Maximum 
patch or brighter pixel from the image is given by this. For the better visual 
quality of a recovered image, this algorithm is using BCP with boundary 
constraints. 


1.3.5 ALGORITHM 


Take foggy image as an input. 

Estimate Air light using 2D order statistical filter. 

Apply DCP using boundary constraints from eq. 1.4. 

Estimate the transmission map that is given by eq. 1.9. 

Apply BCP with boundary constraints from eq. 1.16. 

Apply BCP with pad image and estimate transmission map. 
Integrated the both transmission maps obtained from BCP and DCP. 


Cy) (1.17) 


OVI Be eho 


—int egrated — bright 


Tr (x)= (7 oy / [7 


8. Passed-integrated transmission map into defogged model: 


— integrated 


_ gzpordfilt2 
| a! | (1.18) 
max(7r (x), Po) 


here, p, represents the lower bound. 


1.4 EXPERIMENTAL RESULTS 


In order to certify the performance of proposed technique, it would 
compare with the existing dehazing algorithms over 20 images. 


1.4.1 EXPERIMENTAL SETUP 


This section presents the assessment of the proposed method on MATLAB 
9.0, 64-bitIntelIROCoreTMi3-5005U processor with memory of 4 GB. To 
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compare the performance of the proposed defogging technique, benchmark 
foggy images namely Canon, Toys, Pumpkins, and Cones are taken from 
well-known SPOT database. The pros defogging technique is compared 
with four well-known techniques namely.'*??** 


1.4.2. QUANTITATIVE ANALYSIS 


The performance of the proposed Hybrid technology is evaluated in terms 
of saturated and average gradient. The value of saturated pixels should 
be minimum and average gradient should be maximum for the better 
visual quality. The values of these performance metrics are measured in 
terms of “mean standard deviation.” Table 1.2 show the results obtained 
from the proposed dehazing technique and other compared the algorithm 
in terms of saturated. It is observed from the Table 1.2 that illustrates the 
results required from the implemented defogging technology has less 
number of saturated pixels than the competitive algorithms. Table 1.3 
demonstrates the average gradient occurred from the implemented tech- 
nique and other compared algorithm. It could be seen from Table 1.3 
that is proposed technique preserves the edges as compared to other 
algorithms. 


TABLE 1.2 Performance Comparison in Terms of Saturated Pixels. 


Image [16] [23] [24] [22] Proposed Approach 
Imagel 0.99+0.06 0.97+0.04 0.82+0.05 0.52+40.04 0.47+40.05 
Image2 0.97+0.03 0.81+0.06 0.74+0.04 0.59+40.03 0.52 +40.02 
Image3 0.85+40.04 0.72+0.05 0.70+0.07 0.54+0.06 0.49+0.07 
Image4 0.89+0.08 0.85+0.07 0.81+0.09 0.55+40.07 0.50+0.04 


TABLE 1.3. Performance Comparison in Terms of Average Gradient. 

Image [16] [23] [24] [22] Proposed Approach 
Imagel 1.20+0.04 1.284+0.07 1.344+0.06 1.5740.07 1.89+0.02 

Image2 1.344+0.05 1.5340.09 1.6740.08 1.72+40.05 1.90+0.03 

Image3 1.15+0.03 1.3140.06 1.454+0.04 1.5140.08 1.97+0.04 

Image4 1.324+0.06 1.5740.04 1.6240.05 1.69+40.04 1.82 +0.03 
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1.4.3, QUALITATIVE ANALYSIS 


The Qualitative analysis of proposed algorithm would does on benchmark 
foggy images. Figure 1.1 illustrates the process performed by the proposed 
technique. It is recognized from the Figure 1.1 that proposed technique is 
able to eliminate the fog and preserve the edges. 


FIGURE 1.1 Defogging process: (a—d) Foggy images, (e—h) dark channel prior, (i-l) 
double bright channel prior, (m—p) integrated transmission maps, and (q-t) final defogged 
image. 


1.5 CONCLUSIONS 


In this chapter, a hybrid defogging technique is proposed that integrated 
the basic concepts of DCP and BCP. The proposed technique uses DBCP 
instead of single BCP. The 2D order statistics filter is used to preserve 
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the edge information. The proposed technique is tested on 10 well-known 
benchmark foggy images. Results reveal that the proposed technique 
outperforms the existing techniques in terms of ratio of average gradient 
and saturated pixels by 1.0638 and 1.6931%, respectively. It is able to 
resolve the sky-region problems that are associated with DCP. The 
proposed algorithm also removes the halo and artifacts effects from the 
restored image. 


KEYWORDS 


¢ dark channel prior 

¢ bright channel prior 
¢ filter 

¢ defogging 
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ABSTRACT 


This chapter is a comprehensive literature review as a comparative study 
of machine learning algorithms in Parkinson’s disease diagnosis. The 
recent studies in the literature that are conducted on different datasets 
containing both handwriting and voice datasets of Parkinson’s disease are 
analyzed. The fact that Parkinson data are mostly suitable for machine 
learning analysis, this situation triggers the authors’ tendency to research 
this area. The Parkinson detection literature inclines through deep learning 
algorithms due to the automatic anomaly detection aspect. The recent 
studies go toward an automated disease detection and classification 
system. Therefore, this chapter also aims to include papers that are using 
deep learning methods for Parkinson’s disease diagnosis. The authors 
strongly believe that it will be a handbook for researchers who are eager to 
accomplish research on this subject and it will be very beneficial. 


2.1 INTRODUCTION 


Parkinson’s disease (PD), a chronic and progressive disease, generally has 
symptom like shivering, which occurs in most PD patients. Involuntary 
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shaking may happen in the hands, arms, legs, and chin. However, the 
uncontrolled movement of the thumbs is one of the most common symp- 
toms. Of course, not every handshake is a sign of PD. In order to make 
this diagnosis, a general check-up is required by experts. The slowing 
of movements is a quite common symptom of PD.' Unfortunately, the 
patients are unable to perform the necessary daily life movements over 
time. As they walk, they may see shrinkage in their steps and begin to 
lean forward. Apart from these, most common symptoms, speech changes, 
handwriting deterioration, posture deterioration, sudden movements while 
sleeping and bowel disorders are the other the symptoms of PD.’ 

PD has spread worldwide due to the modern world lifestyle, which 
is more common in older people. The PD represents the second most 
common neurodegenerative disorder after Alzheimer’s disease.’ This 
disease leads to the limitation of the person’s speaking skills, tremors 
in hand movements and movement and muscle problems in general. PD 
reduces the standard of living of sick people and naturally affects their 
families. PD is the second most prevalent neurodegenerative disease in 
the world, affecting approximately 10 million people worldwide.* Non- 
invasive methods are more suitable for these people because most of 
them are not physically good. The most common non-invasive methods in 
clinics in the Parkinson area are the handwriting and voice speech tests.° 
The non-invasive techniques are generally referred to as disease diagnosis 
methods that do not require surgical intervention. 

The datasets collected by these non-invasive methods are generally 
suitable for analyzing by machine learning techniques. There are many 
studies conducted in the literature on the diagnosis of PD by using different 
techniques. Since there is no specific rule of machine learning techniques 
and parameter optimization, the trial and error approaches are often used.° 
Therefore, experiments with different machine learning methods will 
enrich and improve the literature. 

There are many different sorts of articles and researches in Parkinson’s 
literature. Many machine learning methods could be applied to Parkinson 
datasets. In recent literature, the accuracy parameter is usually used 
for evaluating the efficiency of the methods. However, there are many 
different sorts of machine learning performance evaluating methods like 
fl score, sensitivity, and confusion matrix.*?* 

In this chapter, the Background section presents some useful information 
about voice and handwriting datasets; additionally, this section contains 
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the possible treatments of PD. The Literature Review section includes 
a summary of the novel and valuable papers with various classification 
methods. Solutions and Recommendation section introduces the technical 
analysis of the previous section in order to find the optimum solutions to 
the classification problems. Finally, the last section states the conclusion 
of this chapter. 


2.2 PARKINSON’S DISEASE 
2.2:1_ BACKGROUND 


In order to understand the literature review section better, this section 
will focus on the introduction of voice and handwriting datasets that form 
the basis of the research. Although the methods of collecting datasets are 
different from each other, they have all attempted their best to represent the 
differences between people with PD and healthy individuals. In addition, 
the slowing and delaying treatment methods used in the disease process 
will be briefly mentioned in this section. Modern medical approaches have 
made significant progress, especially in the recent period in this area. 


2.2.2 PARKINSON'S DISEASE VOICE DATA 


As the aforementioned issues, many people with PD will be considerably 
dependent on the clinical operation. The essential physical visits to the 
clinic for diagnosing and treatment are stressful for many people with 
PD. Researches have shown that the most critical symptoms of PD are 
dysphonia, gait anomaly, and handwriting tremors. By analyzing the 
literature, it is feasible to claim that approximately 90% of people with 
PD exhibit some form of vocal deterioration. The voice of people with PD 
typically has some sound anomalies which are called dysphonia symptoms. 
The dysphonia symptom is a general term that refers to disorders of voice, 
and it consists of different aspects such as pathological or functional 
problems with one’s voice.’ 

In the case of PD, the voice will sound husky, tense, or laborious. Some- 
times, the patient’s voice may become so rustling and abnormal that the 
listener may have difficulty understanding the patient’s speech. However, 
voice disorders may have been caused by different causes such as vocal 
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nodule-related disorders in vocal cords, or unexpected vocal complica- 
tions in the post-operation stage, or unexpected ulcers on the vocal cords. 
Likewise, misuse of voice can lead to vocal disorders, for instance too 
high or too low usage of voice, or disorders caused by using voice with 
inadequate breathing support or postural disorders. Some dysphonia seems 
like a cross between misuse and something physiological.® 

When the studies analyzing the voice, data were examined, it was deter- 
mined that the operations performed were generally to detect abnormal 
characteristics in the voice data signals. In these studies, the speech sounds 
datasets include standard speech tests which are recorded by a microphone 
and the data are analyzed by measurement methods (implemented in soft- 
ware algorithms) to detect certain properties of these signals. Table 2.1 
illustrates some features of a voice dataset, which becomes a standard in 
this field.” Some preprocessing operations can be performed on these raw 
data in order to extract discriminative features, or it is possible to prepare 
the data in the right format for deep learning architecture’s automatic 
feature selection and learning algorithms. 


TABLE 2.1 Voice Dataset Features. 


Attribute name Description 
MDVP:Jitter(Abs) Variation in fundamental frequency 


Jitter: DDP Variation in fundamental frequency 

MDVP:APQ Measures of variation in amplitude 

Shimmer:DDA Measures of variation in amplitude 

NHR Ratio of noise to tonal components 

HNR Ratio of noise to tonal components 

RPDE Dynamic complex measurement 

DFA Signal fractal scaling exponent 

D2 Dynamic complex measurement 

PPE Non-linear measure of fundamental frequency 

Status The status of the patient (1)—Parkinson’s disease, (0)—Healthy 


2.2.3 PARKINSON'S DISEASE HANDWRITING DATA 


Handwriting tests are one of the most widely used non-invasive methods 
in recent years. The idea of collecting data from handwriting tests to detect 
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PD has led this literature to develop in other directions. Additionally, 
the literature is enriched with image processing and machine learning 
structures. Meanwhile, the analysis of voice data is based on older and 
traditional signal processing methods, whereas handwriting tests can be an 
excellent alternative method for PD diagnosis. 

Many researchers use spiral and meander cases to set a standard in the 
handwriting datasets that are created in more research hospitals around 
the world. For instance, Figure 2.1 shows several spiral test samples in 
which (a) and (b) belong to healthy people and (c) and (d) are the drawings 
taken from people with PD.'! However, when the literature is examined, 
it is possible to observe studies consisting of very different structures and 
methods. Since the collection of handwriting data is more practical, it is 
more widely used today. 


(a) (b) (©) (d) 
FIGURE 2.1 Spiral test samples. 


Handwriting is a complex activity entailing cognitive, kinesthetic, and 
perceptual-motor components,’ the changes in which can be a promising 
biomarker for the evaluation of PD.'? Indeed, there is evidence to suggest 
that the automatic discrimination between unhealthy and healthy people 
can be accomplished based on several features obtained through simple 
and easy-to-perform handwriting tasks.'* Developing a handwriting-based 
decision support system is desirable, as it can provide a complimentary, 
non-invasive, and very low-cost approach to the standard evaluations 
carried out by clinical experts. 

Using dynamic aspects of the handwriting process helps to create a 
useful tendency to analyzing potentialities of automatic handwriting 
systems for PD detection. Several dynamic features of handwriting drawing 
data are X, Y, Z coordination, pressure, grip angle, and timestamp.'° For 
instance, using the features X and Y and their respective pressure features 
can be useful for the solution of PD classification problem.'° Dynamic 


18 Computer Vision and Recognition Systems 


handwriting analysis benefits from the use of digitizing tablets and 
electronic pens. By using these devices, it is straightforward to measure 
the temporal and spatial variables of handwriting, the pressure exerted 
over the writing surface, the pen inclination, and the movement of the pen 
while not in contact with the surface, etc. 

Generally, traditional machine learning, mathematical, statistical 
and feature selection algorithms such as optimum path forest (OPF), 
support vector machines (SVM), naive Bayes (NB), gray wolf optimiza- 
tion, cuttlefish optimization, particle swarm optimization, Visual data 
augmentation, Gaussian mixture model, K-nearest neighbor (KNN), 
random forest, decision trees, time series-based feature images, artificial 
neural networks (ANN), self-organizing map (SOM), radial basis function 
(RBF), linear SVM, Ripper k, fuzzy-KNN, and fuzzy C-means are used 
for PD handwriting diagnosis. Nowadays, machine learning and deep 
learning are often used for the classification of medical images that belong 
to Parkinson patients.'’ However, some different types of ANN or different 
architectures of convolutional neural networks such as cifar10, ImageNet, 
LeNet, ResNet, and VGG16 are used for PD handwriting classification.'® 


2.2.4 PARKINSON'S DISEASE TREATMENT 


The main goal in the treatment of PD is to enable the patient to become 
active, independent, and able to do his/her own work. There is no precise 
treatment for today. However, the limited number of medications used 
(either provide dopamine, either dopamine-like effect or increase the 
use of dopamine by inhibiting the disintegration in the brain) is aimed 
at controlling symptoms. Smart exercise practices, balance exercises, 
and lifestyle changes can be beneficial. Speech and language therapists 
may also be helpful in patients with speech disorders. However, if the 
disease cannot be corrected, the symptoms do not work despite drug use 
and rehabilitation. 

Accordingly, there is no specific medical treatment of PD; the gradual 
decline of the patient can only be managed during the disease progres- 
sion. Therefore, it is essential to detect the disease in early stages by 
machine learning and deep learning methods due to an early diagnosis 
of PD could be crucial for the prospect of medical treatment; likewise, 
it is vital for evaluating the effectiveness of new drug treatments at 
prodromal stages.!” 
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2.3. OVERVIEW OF PARKINSON’S DISEASE LITERATURE 
2.3.1 LITERATURE REVIEW 


Many varieties of machine learning and deep learning methods are deployed 
for PD detection from voice, gait, and handwriting datasets. For instance, 
Bernardo et al.”° introduced a PC app for PD detection. The C# based 
interface app is designed for capturing data from patients; furthermore, the 
author developed some algorithms for feature extraction. The author intro- 
duced novel samples for a handwritten test like a spiral, triangle, and cube. 
Several preprocessing algorithms like Color thresholding, RGB convert to 
grayscale, De-noising the pattern, and Skeletonization process operates for 
feature extraction. Euclidean distance, relative distance, circular distance, 
Manhattan distance, mouse pointer speed, the similarity between pixels 
are features of the dataset. Optimum Path Forest (OPF), SVM and, NB are 
the classifiers of the research. In this work, the author team reached 100% 
accuracy with SVM classifier. 

Pereira et al.'' mainly used the preprocessing methods for distin- 
guishing the template and patient drawings from paper-based tests; color 
thresholding, blur filter, median filter, capturing the pattern of handwritten 
drawings from the paper-based test are the preprocessing stages of this 
work. Features like RMS, maximum difference (argmax), minimum 
difference (argmin), standard deviation, Mean Relative Tremor (MRT) 
had been extracted from images. From the comparison of OPF, NB, SVM 
classifiers, SVM classifier reached 67% of accuracy. The authors collected 
handwriting dataset consists of spirals, meanders, and captured drawings 
from paper-based tests. 

In another research, Pereira et al.*! designed the extracting method for 
feature images from handwriting drawings. The author team extended 
their dataset to six tests such as circle on the paper, circle on the air, 
diadochokinesis with the right hand, diadochokinesis with the left hand, 
meander, spiral, and time-series base images. The main purpose of this 
work was to produce the feature images from raw data by normalizing, 
squaring, and sketching matrixes into greyscale images as CNN inputs. 
The data collected by digitized pen from a tablet in which the features were 
Microphone, Finger grip, Axial Pressure of ink Refill, x, y, z. Different 
sort of CNN architectures was used such as Cifar10, ImageNet for feature 
extraction. Classifiers such as OPF, NB, SVM were deployed and had 
been reached to a 95% accuracy level. 
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In another work, Harihar et al.” designed a hybrid intelligent system for 
accurate PD diagnosis. Aforementioned work consisted different stages; 
for instance, feature preprocessing stage was using model-based clustering 
(Gaussian mixture model), feature reduction/selection stage was using 
principal component analysis (PCA), linear discriminant analysis (LDA), 
sequential forward selection (SFS), and sequential backward selection 
(SBS). In this work, the Parkinson dataset of the University of California- 
Irvine (UCI) was used, which consisted of voice signals. Voice signal 
features like MDVP, NHR, HNR, RPDE, D2, DFA, Spread1, Spread2, and 
PPE were analyzed in this work. Least-square (LS-SVM), probabilistic 
neural network (PNN), general regression neural network (GRNN) had 
been used as the classifiers. The full accuracy level (100%) was reached 
in this work. 

Some of the researchers were analyzing just the visual attributes of 
images in the literature. For instance, Moetesu et al.” assessed the visual 
attributes of handwriting dataset for PD classification. Visual attributes 
of images had been intensified through the novel approach which could 
be called a sort of data augmentation; eight task tests were created with 
using three types of images for the combination of datasets: raw image 
network, median residual network, and edge image network. The author 
revealed that the CNN-SVM model reached 83% accuracy in voting deci- 
sion system. The paper-based dataset contained spiral, |, le, les, lektorka, 
porovant, nepopadnout, sentence tests. 

Drotar’s et al.'° thesis was based on the evaluation of kinematics 
and pressure Parkinson disease dataset for PD detection. Features of 
handwriting drawings collected by tablet and smart pen during tests. Tests 
were composed of drawing an Archimedean spiral, repetitively writing 
orthographically simple syllables and words, and writing of a sentence. 
Some useful kinematic features analyzed in this work, for example, stroke 
speed, speed, velocity, acceleration, jerk, Horizontal velocity/acceleration/ 
jerk, Vertical velocity/acceleration/jerk, Number of changes in velocity 
direction (NCV), Number of changes in acceleration direction (NCA), 
Relative (NCV), Relative NCA, On-surface time, Normalization-surface 
time. KNN, ensemble Adaboost, and SVM classifiers were used for 
classification, and the highest accuracy percentage was 82% in this work. 

For creating novel hybrid models for PD detection, Gupta et al.”4° 
took the advantage of grey wolf and cuttlefish optimization algorithms as 
the search strategy for feature selection. Modified grey wolf optimization 
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based on updating hunters’ positions in an optimum way and optimized 
cuttlefish algorithm for feature selection consists of four groups: global 
solution, local search, local solution, random solution were used in this 
study. Random Forest, KNN, Decision tree classifiers were the applied 
methods for classification, and 94% accuracy was collected. The main 
goal of these works was to find the optimal subset of features. Different 
datasets that were composed of handwriting and voice and gait informa- 
tion of patients were analyzed in this article. 

The performance of different architecture of CNN-based models had 
been evaluated in Pereira et al.*° The authors designed tests for collecting 
time-series features of handwriting drawings from patients in order to 
produce feature images. Features of raw data were composed of Micro- 
phone, Finger grip, Axial Pressure of ink Refill, x, y, z, feature images 
information which were collected by tablet and smartpen. The author used 
different models of CNN’s like cifar10, ImageNet and LeNet and OPF for 
classification and reached to 85% accuracy. Spiral and meanders were the 
tasks of the proposed tests. 

In another research, Spadoto et al.”’ analyzed the Oxford PD Detection 
Dataset, which contained voice signals of PD patients. In this work, some 
preprocessing methods were designed to prepare the dataset for different 
classifiers like OPF, SVM-RBF, SVM-LINEAR, ANN-MLP, SOM, KNN, 
and finally reaches 75% accuracy through the analyses. Traditional voice 
dataset features like MDVP, NHR, HNR, RPDE, D2, DFA, Spread1, 
Spread2, and PPE were used. 

In another article, Diaz et al.”° proposed the dynamical enhancement 
of static images of handwriting tests. Dynamically enhanced static image 
was drawing the points of the samples, instead of linking them; so, by this 
approach, some kinematic information could be reachable (poral/velocity 
information). The primary goal of this study was to construct enhanced 
images from raw data: raw image, median filter, edge images combination 
for voting classification. The Paper-based dataset composed of some tests 
such as drawing spiral, |, le, les, lektorka, porovant, nepopadnout, and write 
a sentence. Different sort of classifiers such as SVM linear, SVM-RFB, RF, 
ET, ADA were used for the classification and 88% accuracy was obtained. 

For instance, in a novel approach, Loconsole et al.” modified an EMG 
signal detector tool for PD detection purposes. The authors collected a 
handwriting dataset that contained a total of three tasks: sentence and 
two drawing sample tasks. In this work, some handwriting features are 
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extracted from EMG signals such as Density ratio, Height ratio, Execution 
time, Execution average Linear speed, Acceleration norm, Gyroscope 
components, and RMS. Simple ANN model Optimal topology of ANN and 
SVM was reached to 89% accuracy. Computer vision based handwriting 
analysis tool and surface ElectroMyoGraphy (sEMG) signal-processing 
techniques were the central aspects of this work. 

In Graga et al.,*° an online mobile app was designed for data collection 
of patients. Development of the mobile app for online handwriting tests 
and also analysis of the gait positions were the main aspects of this work. 
The mobile app could detect drawings features like Spiral Average Error, 
Spiral Cross, Spiral Pressure Ratio, Spiral Side Ratio, Tap Time Ratio, 
and Tap Pressure Ratio. Decision tree, Ripper k, and Bayesian Network 
classifiers were used to reach 85% of accuracy. 

In Drotar et al.,?! the air movement-based data collection method was 
used for the Parkinson dataset. Online in-air & on-surface movement- 
based features were analyzed by SVM classifier, and 85% of accuracy was 
obtained. There were some spiral and word writing tasks in the applied 
tests. 

Shahbaba et al. proposed a new mathematical approach dpMNL 
(multinomial logit) for PD classification problems in the voice dataset. The 
proposed model was using the Dirichlet process mixtures, which allowed 
maintaining the relationship between the distribution of the response 
variable and covariates in a non-parametrically way. This model was 
generative, so it had advantages over the traditional MNL (multinomial 
logit) models which were discriminative. The five-fold cross-validation 
method was used for evaluating the performance of the model, and 87.7 + 
3.3% accuracy was achieved. 

In another research, Psorakis et al.** investigated the classification ability 
of the proposed improved mRVMs (multiclass multi-kernel relevance 
vector machines) over the real world datasets such as Parkinson dataset. 
The research team achieved some improvements such as convergence 
measures, sample selection strategies, and model improvements for better 
results by 10-fold cross-validation with 10 repetitions. 

In another work on the Parkinson voice dataset, Little et al.*+ proposed 
dysphonia detecting for PD detection. Also, the authors proposed a novel 
dysphonia measure, Pitch Period Entropy (PPE), besides usual speech 
features. The primary approach of this work was setting the exhaustive 
search of all possible combinations of dysphonia measures to find the 
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optimum results. The combination of particle swarm optimization algo- 
rithm and OPF classifier was also used for classification. As a result of the 
experiments, the combination of pre-selection filter and exhaustive search 
with SVM classifier reached 91.4 + 4.4% of classification accuracy. 

Spadoto et al.*> introduced some evolutionary-based techniques such as 
Particle swarm optimization, Harmony search algorithm and Gravitational 
search algorithm for maximizing the OPF classifier performance. The 
authors analyzed the Oxford PD Detection Dataset through the research. 
Although OPF classifier reached 71% classification accuracy, the combina- 
tion of PSO and OPF had been reached 73% accuracy and combinations of 
HS, GSA, and OPF’s results were slightly better than the others (84.01%). 

Sakar et al.*° detected PD from dysphonia measures. The main aspects 
of this research were selecting the optimum subset of features and building 
a minimal model bias. Therefore, the authors calculated the relationship 
between the features and the PD score statically. For this task, the authors 
utilized maximum-relevance-minimum-redundancy (mRMR) and SVM 
classifiers. The author used the leave-one-out method for evaluating the 
generalization level of the proposed model. 

Das*’ compared different machine learning approaches as Neural 
network, DMneural, Regression, and Decision tree for PD classification 
tasks. Through the experiments, ANN performed better than other models. 
Different ratios of the dataset were used for the evaluation of the model. 
As the result of this research, neural networks had achieved 92.9% of 
classification accuracy when 65% of the dataset was used for training, and 
35% of the dataset was used for testing with random data splitting. 

Guo et al.** suggested a combination of genetic programing and the 
expectation-maximization algorithm (GP—EM) to transform data of PD 
dataset. The model was applied to voice dataset with flexible and effective 
learning modules and Gaussians mixture model of the data. The 10-fold 
cross-validation was used as a performance validation method of the 
model. Mean accuracy of GP-EM method was 93.1 + 2.9%. 

Tsanas et al.*’ tried a different combination of feature selection algorithms, 
and the classifiers were compared to find the best one among them. The 
feature selection algorithms of this work were the least absolute shrinkage 
and selection operator (LASSO), minimum redundancy maximum relevance 
(mRMR), RELIEF, and local learning-based feature selection (LLBFS). 
Obtained features were classified by random forest and SVM classifiers. By 
using only 10 dysphonia features, overall accuracy was around 99%. 
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Astrom et al.*° introduced a novel approach to use neural networks 
for medical data processes. The central aspect of this article was to use 
more than one unique neural network parallelly due to error reduction. The 
outputs of different sort of NN were evaluated with a rule-based system 
for weighting the outputs for creating a final decision for PD classifica- 
tion. The designed parallel model allowed the system to learn unlearned 
data of an NN by another one. In conclusion, the results revealed that the 
parallel system improved the robustness of the classification procedure. A 
parallel NN system was composed of nine different NN and it enhanced 
the ordinary classification rate by almost 10%. The suggested model had 
achieved 91.20% accuracy. 

Chen et al.*! suggested a fuzzy KNN based system (FKNN) in 
comparison to SVM. Dataset of this work was a range of biomedical voice 
measurements obtained from 31 people, and 23 of them with PD. The 
best classification accuracy (96.07%) obtained by the FKNN based system 
using a 10-fold cross-validation method could ensure a reliable diagnostic 
model for the detection of PD. PCA was also used for dimension reduction. 

Ozcift” analyzed a voice Parkinson disease dataset composed of 31 
people, 23 with PD and each person’s record has 22 features. The linear 
SVM was used for selecting the most valuable subset of features (10 
features). Through the experiments, three evaluating parameters were 
considered: accuracy, Kappa Error (KE), and Area under the receiver 
operating characteristic (ROC). Two base performance measures, IBK (a 
KNN variant) and KStar (kind of KNN) were used to compare the two 
main classifiers. By applying RF ensemble to classification, the obtained 
accuracy was around 97%. 

Mandal et al.*?** introduced a new dysphonia measure, which was 
called the severity of the disease. The authors used Haar wavelets as the 
projection filter, and multinomial logistic regression and linear logistic 
regression as the classifiers for the research. Feature selection of the study 
mainly relied on SVM and ranker search methods. The authors compared 
many conventional approaches in the literature such as Bayesian network, 
SVM, ANN, Boosting methods, and linear and multinomial logistic 
regression methods for PD classification. The authors revealed that the 
study had been reached to 100% of classification accuracy. 

Zuo et al. proposed a Particle swarm optimization algorithm for 
parameter optimization and feature selection. The classifier which was 
used for this research is a FKNN. The proposed model was a combination 
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of PSO and FKNN; this model’s performance had been evaluated through 
10-fold cross-validation. The average value of accuracy was 97.47%. A 
PD voice dataset from UCI database was analyzed through the research. 

Luukka et al.*° introduced a hybrid model for PD detection. The model 
composed of fuzzy entropy-based feature selection combined with a 
similarity classifier. This combination proved the efficiency of the model 
by simplifying the dataset and accelerating the classification process. 
The model’s results revealed that the hybrid model reached a high-level 
accuracy (mean value = 85.03%). 

In another work, Li et al.47 compared optimization approaches by 
analyzing different medical datasets. The primary purpose of this work 
was to find the optimum feature set of datasets for better classification 
results—a fuzzy-based non-linear transformation method was designed 
for selecting the optimum feature subset from PD dataset. Also, the 
authors compared the proposed feature selection method with principal 
component analysis (PCA) and kernel principal component analysis 
(KPCA) feature selection methods for illustrating the efficiency of the 
method. The proposed classification approach was applied on different 
sorts of datasets such as Pima Indians’ diabetes, Wisconsin diagnostic 
breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders 
dataset, and bladder cancer cases dataset. In conclusion, fuzzy-based non- 
linear transformation’s performance with SVM classifier was found to be 
better than other methods (93.47% accuracy). 

Ozcift et al.“* proposed computer-aided diagnosis (CADx) systems 
to improve accuracy. The author proposed the combination of rotation 
forest (RF) and some machine learning algorithms (30 ML algorithms) to 
diagnosis disease from heart, diabetes, and Parkinson’s disease datasets. 
RF classifier predicted the accuracy (ACC), KE, and area under the 
receiver operating characteristic (ROC) curve (AUC) of 74.47, 80.49, and 
87.13% respectively. 

Khatamino et al.” proposed an efficient convolutional neural network 
for PD classification. The generalization ability of the model was 
illustrated by comparing it with conventional machine learning classifiers 
such as SVM and NB. One of the main purposes of this work was to 
show the discriminative power of the novel DST test. Another main aspect 
was to illustrate CNN’s flexibility and powerful feature learning ability 
by comparing it with SVM and NB. However, two main approaches 
were Selected for evaluating the performance of the proposed model (CV, 
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LOOCV). The proposed model was evaluated well due to it performed 
effectively on two different handwriting datasets. The model reached 
to 88.89% classification accuracy in the case of 75% training and 25% 
testing datasets. 

Indira et al.,° utilized two methods for PD classification that were 
the fuzzy C-means clustering and the ANN. Fuzzy C-means clustering 
method had achieved 68.04% accuracy, 75.34% sensitivity, and 45.83% 
specificity. Likewise, ANN, which was optimized by the filtering methods 
and PCA has achieved 92% mean accuracy. PD voice dataset was analyzed 
in this work. 


2.3.2 SOLUTIONS AND RECOMMENDATIONS 


This chapter surveyed all studies that used machine learning algorithms 
in PD diagnosis. In order to analyze the factors affecting the success rate 
of the proposed algorithms, studies were summarized in terms of the clas- 
sification methods and classifier types, years, datasets, and accuracy rates 
as shown in Table 2.2. 

As one of the results of this literature review, it is realized that 
researchers have tended to collect PD data, especially in collaboration 
with the research hospitals in many studies. It is evident that the ideas and 
guidance of the doctors and medical experts of the neurology departments 
are essential. 

In general, it is observed that high accuracy percentages have been 
achieved in the literature in the last few years. One of the significant 
reasons for this performance growth is the improvements in machine 
learning and deep learning libraries of different programming languages. 

The analysis of handwriting data often shifts to the field of image 
processing. Therefore, some researchers are trying new methods of deep 
learning rather than old image processing techniques in current studies. 
Some studies have shown successful results of the CNN structure. CNN’s 
discriminative detection power is useful for this topic as a literature review 
as well. The reason is that automatic pattern detection filters are available 
instead of designing manual filters in new methods. It is more practical 
to use CNN’s self-learning adaptive filters instead of conventional image 
processing methods for feature extraction. Moreover, user-friendly inter- 
faces of programing IDE’s facilitate easy model creation. 


TABLE 2.2 Literature Review Summary. 


Study Method & classifier type 


Data description 


Accuracy (%) 


[34] 


PSO + OPF, bootstrap SVM, Pre-selection 
filter + exhaustive search + SVM 


Non-linear Dirichlet mixtures, dpMNL, 
decision trees, SVM 


OPF, SVM-RBF, SVM-LINEAR, 
ANN-MLP, SOM, KNN 


Multiclass multi-kernel relevance vector 
machines (Improved mRVMs) 


mRMR + SVM 


ANN 


GP-EM 
CFS—RF 


Fuzzy-based non-linear transformation + 
SVM, PCA, KPCA 


Fuzzy entropy measures + Similarity 
classifier 


Parallel NN 
OPF, PSO + OPF, HS + OPF, GSA + OPF 


Oxford Parkinson’s Disease dataset features plus 
Pitch Period Entropy (PPE) 


Parkinson’s disease voice dataset 


Speech dataset: MDVP, NHR, HNR, RPDE, D2, 
DFA, Spread1, Spread2, PPE 


Oxford Parkinson’s Disease dataset 


Oxford Parkinson’s Disease Detection dataset: 
MDVP, NHR, HNR, RPDE, D2, DFA, Spread1, 
Spread2, PPE 


Parkinson’s disease speech dataset 


Oxford Parkinson’s Disease Database (OPDD) 


Parkinson’s disease voice dataset: MDVP, Jitter 
DDP, APQ3 


Different sort of medical datasets and Parkinson’s 
disease voice dataset 


Parkinson’s disease voice dataset 


Oxford Parkinson’s Disease voice database 


Oxford Parkinson’s Disease Detection dataset: 
MDVP, NHR, HNR, RPDE, D2, DFA, Spread1, 
Spread2, PPE 


Bootstrap with 50 
replicates 91.4+4.4% 


87.7 + 3.3% (five-fold CV) 


75.37 + 3.58% 
(random test data) 


89.55 + 6.6 % (10-fold CV) 


92.8 + 1.2% (bootstrap with 50 
replicates) 


92.9% (65% training and 35% 
testing) 


93.1 + 2.9% (10-fold CV) 
87.1% (10-fold CV) 


93.47% (hold-out) 
85.03% (hold-out) 
91.20% (hold-out) 


PSO + OPF 73.53%, HS + OPF, 
GSA + OPF 84.01% (hold-out) 
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TABLE 2.2 (Continued) 


Study Method & classifier type Data description Accuracy (%) 
[39] LASSO, mRMR, RELIEF, LLBFS, Parkinson’s disease speech dataset (dysphonia 99% (overall value on test data) 
Random forest, SVM measures) 
[42] RF ensemble of IBk, SVM Parkinson’s Disease voice signal dataset 97% (test data) 
[45] | PSO-FKNN Parkinson’s disease voice dataset from UCI 97.47% (10-fold CV) 
database 
[9] fuzzy C-means, ANN Speech signal dataset fuzzy C-means 68.04% 
ANN 92 % 
[44] Multinomial logistic regression, Haar Parkinson’s Disease voice dataset 100% (test data) 
wavelets 
[41] | PCA-FKNN Parkinson’s Disease speech dataset of UCI 96.07% (average10-fold CV) 
[22] Gaussian mixture model, PCA, LDA, Voice signals: MDVP, NHR and HNR, RPDE and 100% (test data) 
SFS, SBS, LS-SVM, PNN, GRNN D2, DFA, Spread1, Spread2, and PPE 
[30] Decision tree, Ripper k, Bayesian Handwritten drawings: Spiral Average Error, Spiral 86.67 + 13.54% (10-fold CV) 
Network Cross, Spiral Pressure Ratio, Spiral Side Ratio, etc. 
[31] Air movement base data collection, SVM Handwritten drawings: Online in-air & on-surface 85.61% (test data) 
movement-based features 
[43] Linear logistic regression, Haar wavelets Parkinson’s Disease voice dataset 100% (test data) 
[11] Color thresholding, blur, median, OPF, Handwritten drawing dataset: RMS, argmax, 66.72 + 5.33% (four-fold CV) 
NB, SVM argmin, standard deviation, MRT 
[16] KNN, ensemble Adaboost, SVM Parkinson’s disease handwritten kinematic features 82% (test data) 
dataset: stroke speed, velocity, acceleration, etc. 
[26] cifarl10, ImageNet, LeNet, OPF Handwritten drawing: Microphone, Finger grip, 85% (25% of test data) 


Axial Pressure of ink Refill, x, y, z, feature images 


87 


sulajsAg uolqjUubo2ay pun uolsi, 4ajndwo) 


TABLE 2.2 (Continued) 


Study Method & classifier type Data description Accuracy (%) 
[49] CNN, NB, SVM Parkinson’s disease handwritten drawings dataset, 88.89% (25% test data) 79.64% 
feature-based images (10-fold CV) 
[21] Time series-based feature images, CNN: Handwritten drawing: Microphone, Finger grip, 95% (voting decision) 
CIFAR10, ImageNet, OPF, NB, SVM Axial Pressure of ink Refill, x, y, z, feature images 
[25] Optimized cuttlefish, KNN, Decision tree Parkinson Hand, speech, voice datasets 94% (mean value) 
[20] Image skeletonization by web app, OPF, Collecting handwritten drawing dataset by web app 100% (test data) 
SVM, NB 
[23] Visual data augmentation, CNN-SVM Parkinson’s disease handwritten dataset: raw, 83% (voting decision) 
median, edge 
[28] Linking drawing points, SVM linear, Parkinson handwritten drawings: raw image, 88% (test data) 
SVM-RFB, RF, ET, ADA median filter, edge images 
[29] sEMG, Optimal topology of ANN and Handwritten drawings: Density ratio, Height ratio, 89% (test data) 
SVM Execution time, gyroscope components, RMS, etc. 
[24] Modified GWO, Random Forest, KNN, = Handwritten drawing, voice datasets 94% (mean value) 


Decision tree 


swyzobjy bulusvaT aulyovp fo Apnjg aanoisvdwo) y 


672 


30 Computer Vision and Recognition Systems 


In the literature, the testing stage is generally performed by repeating 
the test data evaluation and taking the average value. However, cross- 
validation, leave-one-out cross-validation and voting decision were used as 
the performance metrics for evaluating the models in some cases. Besides, 
early stopping condition of the learning process could be implemented 
for more optimized progression of the training and to obtain the optimum 
efficiency in the resources. Although many parameters can measure the 
performance of the machine learning methods, the accuracy metric comes 
to the forefront in the literature, and this parameter widely is used for 
performance comparison. 

It has been observed that the preprocessing stage have a positive contribu- 
tion to the accuracy rate. Since there is no specific rule for machine learning, 
the literature will be enriched if different preprocessing methods are applied. 
Basically, considering all the studies examined during the research with any 
preprocessing procedure (about more than 50 works), studies on voice data 
resulted in an average of 90.5% accuracy; furthermore, studies on hand- 
writing data resulted in an average accuracy of 87.8%. Figure 2.2 is a visual 
illustration of the average accuracy values of all analyzed works. 


| 
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FIGURE 2.2 Average accuracy percentage of studies. 


When literature is examined exhaustively, it is seen that higher accu- 
racy percentages are observed in recent studies than the previous studies. 
One reason for this is that machine learning libraries used in programming 
languages have become more user-friendly. Another reason is that as time 
goes on, many different methods have been tried in the literature, more 
and more optimized methods have emerged accordingly, the researchers 
have started to make more comprehensive and solution-oriented studies 
by benefiting from these developments. 

In some studies’ preprocessing section, different mathematical formulas 
are developed for novel attributes creation and extraction from raw 
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time-series data. These approaches attempt to increase the discriminative 
power of the dataset. Raw data often need to be changed through some 
preprocessing procedure due to the dataset is not suitable for machine 
learning analysis by default. In the research process, more than 50 studies 
are examined; consequently, Figure 2.3 shows that the studies in the field of 
voice data are more than the handwriting ones in general. 
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FIGURE 2.3 Study distribution by datasets. 


Figure 2.4 illustrates the methods which are used in the PD literature 
and their use case percentage among all analyzed studies respectively. 
SVM classifier is generally used for classification on both voice and hand- 
writing data. The figure shows that SVM, NB and OPF classifiers are often 
used for this literature. 

Innovative approaches are included in this study, as well as attributes 
and methods that have become standard in PD diagnosis for many years. 
For instance, generally the attributes collected in a handwriting dataset 
are X, y, Z, pressure, grip angle and timestamp. However®! calculate 
attributes such as speed, acceleration, RMS, etc. by innovative formulas. 
Moreover, another example of creativity in the literature is creating unique 
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feature-based images from handwriting raw data.*° proposed new formulas 
for feature calculation then transform this feature into images in order 
to utilize CNN models for classification. Furthermore, as a completely 
different approach, brain tomography of the patients was used as input 
data to the image processing and classification models in order to PD 
diagnosis.*!°6 
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FIGURE 2.4 Study distribution by used methods. 


2.3.3 FUTURE RESEARCH DIRECTIONS 


This section introduces some useful, practical and theoretical suggestions 
for potential research fields to researchers. The authors have tried to select 
and propose novel approaches in order to expand perspective readers in 
case of PD classification literature. 

Considering the successful results of the CNN structures on the PD 
data classification; 1-D CNN architecture can be implemented as a feature 
learning model and classifier for signal-based time-series PD raw data. 

In order to create a practical and useful dataset, voice and handwriting 
attributes can be used as a feature combination for each PD patient. 
This approach requires a tablet device and a microphone for gathering 
data. Theoretically, this tendency will boost any classification model’s 
performance. 

Voice and handwriting data which are collected from patients can be 
combined with the patients’ brain tomography to form a hybrid PD dataset. 
The main idea is increasing discriminatory power of PD dataset’s features 
and using a voting decision in order to better classification result. 
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Handling datasets through the preprocessing stage is the main aspect 
of the work for this literature. Using different optimization algorithms will 
enrich the literature. Therefore, the comparison of the results before and 
after preprocessing stage shows the preprocessing stage’s performance 
and importance. 

Due to trial and error mentality of machine learning algorithms, 
comparison of different hybrid models is very useful in order to find out 
the optimum model. The combinations of different CNN architectures, 
for instance, cifarl0, ImageNet, LeNet, ResNet, VGGI6, etc. and highly 
recommend machine-learning classifiers, for instance, OPF, SVM, NB, 
KNN, random forest, decision trees, MLP, ANN, SOM, RBF, linear SVM, 
Ripper k, fuzzy-KNN, fuzzy C-means, etc. can be implemented on PD 
datasets in order to initiate novel and unique hybrid classification models. 

Additionally, when the dataset is in a time-series format, different 
feature-based images can be extracted from the dataset. In other words, 
if the dataset is collected as signals, it can be shaped in many forms of 
information that researchers want. The main purpose of this suggestion is 
utilizing all the dataset features for the classification process. Therefore, it 
is highly recommended for collecting the dataset as time-series features. 

Moreover, in some articles the author team introduce the PC, mobile, 
and tablet app for collecting online tests information in order to create a 
dataset from patients; if medical experts and researchers introduce some 
tests as standards, then this system can be transformed as an online PD 
detection system, finally. 

In addition, Voice signal’s frequencies sketches could be considered 
as CNN input data in the form of images. Likewise, it will also be useful 
to create hybrid architectures by using some optimization algorithms like 
cuttlefish, grey wolf, or ant colony optimization, etc. 

Finally, some preprocessing approaches can improve the quality of 
raw data in order to have an effective classification. For instance, many 
different filters can be used in preprocessing stage such as salt-pepper 
filter, blur median, low pass filter, Mean Filter, Gaussian Smoothing, 
Conservative Smoothing, Crimmins Speckle Removal Frequency Filters, 
Laplacian/Laplacian of Gaussian Filter, Unsharp Filter. 

As seen in Table 2.2, the previous basic datasets have been updated 
and enriched by developing some new methods which create new features 
from the features of these datasets. Statistically, analysis of the literature 
shows that researchers are trying to find the most discriminative feature to 
obtain better classification results. 
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2.4 CONCLUSION 


This chapter presented a comprehensive review of the prediction of the 
Parkinson disease by examining the papers that were using machine learning- 
based approaches. These studies usually attempted to diagnose the PD from 
voice and handwriting datasets. Essentially, the diagnosis of PD is a classifica- 
tion problem. In the medical diagnostics process, experts always need data and 
tests to facilitate and support their decisions. Therefore, it is vital to provide 
auxiliary data for the diagnosis of this disease with machine learning methods. 

In this study, the accuracy percentages obtained with the methods that 
had been used in the related papers were used as the basic criterion to 
make a comparison between these methods. Additionally, analysis of the 
literature showed that many different preprocessing methods might lead to 
obtaining high accuracy results. Since the data were collected in different 
ways in the handwriting dataset, it required a more detailed preprocessing 
stage and some better results were obtained. Another result of the analysis 
of the literature was to point out the importance of using hybrid machine 
learning, deep learning, and mathematical algorithms. The progress of 
PD literature showed that new researchers had to find some efficient and 
practical approaches for the desired automatic PD system. 

In conclusion, present models and results are not enough for a fully 
automatic PD diagnosis system. Accordingly, current methods need a 
medical expert decision for final diagnosis stage. However, this literature 
helps in identify early symptoms of PD and encourage doctors for more 
medical tests to ensure the diagnosis of PD. The literature improves with 
the new experiments and new ideas, and all works concern is designing the 
automatic disease detection system. 
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ABSTRACT 


Hypertensive retinopathy (HR) is a retinal sickness that is caused because 
of reliably (hypertension) and prompts vision misfortune. A great many 
individuals on the planet are experiencing HR illness because of hypertension. 
The variations from the norm happen on the retina because of hypertension. 
This sickness does not have any early signs and much of the time, HR is 
analyzed at later stages when the illness prompts visual deficiency or vision 
misfortune. It is fundamental for hypertensive patients to have a normal 
assessment of their eyes. This chapter deals with the description of HR, for 
example, classification, symptoms, and related risk factors. It also deals 
with the comparative analysis of the algorithms proposed by the researchers 
on how the machine learning approaches are more accurate in automatic 
detection of HR, for example, conventional methods and machine learning 
approaches proposed for the detection of HR. 


3.1 INTRODUCTION 


The human eye is an incredibly perplexing structure that empowers locate, 
one of the most significant of the human detects. Sight underlies our 
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capacity to comprehend our general surroundings and to explore inside 
our condition. As we take a gander at our general surroundings, our eyes 
are always taking in light, a part basic to the visual procedure. The retina is 
a layer of sensory tissue that covers within the back 66% of the eyeball, in 
which incitement by light happens, starting the impression of vision. The 
retina is really an augmentation of the cerebrum, shaped embryonically 
from neural tissue and associated with the cerebrum legitimate by the optic 
nerve. The retina capacities explicitly to get light and to change over it into 
substance vitality.** The concoction vitality initiates nerves that direct the 
electrical messages out of the eye into the higher districts of the mind. 


Smart animals have dumb retinas and 
dumb animals have smart retinas. 


Hypertensive retinopathy (HR) is the medicinal term brought about 
by hypertension. It, for the most part, influences the retina and retinal 
blood dissemination. Because of hypertension, retinal blood vessels (BV), 
for example, retinal corridors and retinal veins are likewise influenced. 
So blood dissemination to the retina crumbled. The indication of HR 
relies upon the patient’s conditions.'’? Some ailments may have visual 
conditions. HR side effects incorporate vein changes, supply route 
narrowing, vein narrowing, conduit vein crossing area edge deviation; this 
is known as arteriovenous scratching. This arteriovenous scratching, for 
the most part, influences the supply routes and veins crossing areas.'7'8 

This chapter deals with the detailed description of the HR, for example, 
classification or grading, symptoms and risk factors related to HR. This book 
chapter deals with the comparative analysis of the algorithms proposed by 
the researchers or scientists how the machine learning (ML) approaches 
are more accurate in automatic detection of HR, for example, conventional 
methods and ML approaches proposed for the detection of HR. 


3.1.1 THE EYE FUNDUS 


The advanced field of ophthalmology was borne from hundreds of years 
of perception and revelation that inevitably became grounded in logical 
information. A noteworthy advance in the comprehension and finding 
of eye ailments was the improvement in the nineteenth century of the 
ophthalmoscope, an instrument for investigating the inside of the eye. 
With this gadget, ophthalmologists could promptly inspect the retina and 
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its veins, in this manner acquiring important data about the inward eye and 
eye maladies!>. 

The ophthalmoscope instrument is used for reviewing the inside of 
the eye. It was created in 1850 by German researcher, furthermore, savant 
Hermann von Helmholtz. The ophthalmoscope turned into a model for 
later types of endoscopy. The gadget comprises of a solid light that can 
be coordinated into the eye by a little mirror or crystal. The light reflects 
off the retina and back through a little opening in the ophthalmoscope, 
through which the inspector sees a non-stereoscopic amplified picture of 
the structures at the back of the eye, including the optic plate, retina, retinal 
BV’s, and macula shown in Figure 3.1. The ophthalmoscope is especially 
helpful as a screening device for different visual infections, for example, 
diabetic retinopathy (DR).'° 


OPTIC DISC 
FOVEA 


CENTRAL RETINAL VEIN 
MACULA 


CENTRAL RETINAL ARTERY 


RETINAL VENULES 


RETINAL ARTERIOLES 


FIGURE 3.1 The retinal fundus image. 


Source: Reprinted with permission from Carolina Ophthalmology, open access. 


The appropriate upkeep of the retina is exceptionally essential for 
good vision. There are different eye-related infections like DR, HR, 
Retinopathy of Rashness, and Retinal Vein Occlusion which, for the most 
part, influences the retina. In the event that they stay undiscovered for 
quite a while, it can lead to loss of vision. Specialists and doctors recognize 
these retinal sicknesses when signs like hemorrhages (HEM), delicate and 
hard Exudates (EX), optic plate growing and arteriolar narrowing are 
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present. There are different parameters utilizing which they can grade the 
seriousness of retinal sicknesses* (Fig. 3.2). 


Superior arcade 


Macula 


© 


Fovea 


Vein Inferior arcade 


FIGURE 3.2 The right eye fundus image. 


Source: Reprinted with permission from Ref. [11] © Elsevier. 


The focal point of the fundus lies on the optical hub; this is the fovea, 
which gathers the best-settled pictures, and it is typically connected with 
a little yellow dab, the macula lutea. The anatomic and clinical foveola, 
fovea, and macula are shown on the outline. The major vascular inventory 
of the retina structures from the predominant and substandard arcade 
of veins. The retinal region between the predominant and substandard 
arcade is known as the territory central or back post. The focal point of 
this back post contains the macula, which is redder (dim dark in the print 
adaptation) and denser in shading than the encompassing retina. This is 
because of more photoreceptors stuffed at high densities what’s more, 
more colors behind the photoreceptor cells. The macula lutea alludes to 
yellow xanthophyll color inside the retina in the focal point of the macula. 
The focal point of the macula is alluded to as the fovea, which is 500 wm in 
distance across an avascular region that basically made out of the internal 
restricting layer and concentrated cone photoreceptor cells, known as the 
pack of Rochelle Duverney. The significant vessels unmistakable in this 
shading fundus photographs lie in the superior retina.'!?! 
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3.2 HYPERTENSIVE RETINOPATHY 


The retina is an inside and significant piece of the human eye whose work 
is to catch and send pictures to the cerebrum. It comprises of various 
structures alongside two sorts of veins, veins and courses. These retinal 
veins are influenced by the quantity of eyes maladies. HR is caused because 
of a steady high pulse in retinal BV’s. A great deal of people groups on 
the planet is experiencing HR sickness; be that as it may, by and large, HR 
patients are ignorant of it. The presence of HR and its seriousness can be 
distinguished by the patient’s eye ophthalmologic assessment. More often 
than not, HR is analyzed at the last stage which drove the patient to visual 
deficiency or vision misfortune; thusly, it is important for HR patients 
to ensure the standard assessment of their eyes.* Presumably, there will 
not be any signs until the condition has advanced broadly. Potential signs 
and indications include: compact visualization, eye-distension, teeming 
of a BV, double visualization accompanied by headaches. Getting quick 
restorative assistance is better if the circulatory strain is high; and all of a 
sudden has changed in the vision.’ 

Clinical discoveries of HR incorporate the presence of sores which 
can be ordered into two gatherings, for example, delicate exudates and 
hard exudates. Delicate exudates are otherwise called Cotton Wool Spots 
(CWS). CWS is soft white-yellow spots seen in cutting edge phases 
of HR, though HE is splendid yellow injuries. These CWS are either 
observed detached in fundus pictures or exist with different injuries like 
HEM and HE of tissue’s blood supply. CWS is likewise found in the retina 
of diabetic patients yet they are all the more firmly identified with HR 
when contrasted with DR. DR is described by different HE and a couple of 
CWS while numerous CWS is related to HR.'* This sickness does not have 
early ciphers and much of the time, HR is examined at later phases when 
the illness prompts visual deficiency or vision misfortune. In this way, it 
is fundamental for hypertensive patients to have a normal assessment of 
their eyes. 

Drawn out hypertension, or hypertension is the primary driver of HR. 
Hypertension is an interminable issue where the blood power against 
the arteries is excessively high. The power is an after effect of the blood 
siphoning out from the heart and to the supply routes just as the power 
made as the heart rests between pulses. When the blood travels through 
the BV with more pressure, in the long run it makes harm by extending the 
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supply routes. This prompts numerous issues after some time. HR, for the 
most part, happens after the pulse has been reliably high over a drawn-out 
period. The blood pressure (BP) can be influenced by: not having a daily 
physical activity, being fatty, taking too much salt in daily food, daily 
stress, High BP.° 

HR is analyzed dependent on its clinical appearance on the widened 
funduscopic test and concurrent hypertension. The primary care physician 
will utilize an ophthalmoscope to analyze the retina. It sparkles a light 
through the understudy for inspecting the rear of the eye for indications of 
tightening veins or to check whether any liquid is spilling from the veins. 
This strategy is easy. It takes under 10 min to finish. At times, a unique 
test called fluorescein angiography (FA) is performed to look at the retinal 
bloodstream. In this strategy, the doctor applies distinct eye droplets to 
enlarge the pupils and afterward takes the photos of the eye. After taking 
the pictures, the primary care physician will infuse a colorant called 
fluorescein into a vein. They will commonly do this within the elbow. As 
the dye moves into the veins of the eye the retinal images are accepted. 
Intense harmful hypertension will make patients grumble of eye agony, 
migraines, or diminished visual keenness. Ceaseless arteriosclerotic 
changes from hypertension would not cause any side effects alone. In 
any case, the complexities of arteriosclerotic hypertensive changes make 
patients the present with normal indications of vascular impediments or 
micro aneurysms (MA). For any disease, it is better to know the severity of 
that disease so that the required treatments and precautions for the further 
developments of the disease can be taken. In this purpose, it is required 
to grade the disease. The following section discusses the grading and 
classification of the HR.*! 


3.2.1 GRADING HR 


Keith and colleagues developed the first classification system for HR in 
1939. Since then, the original model has been criticized for the repro- 
ducibility and validity of the method in clinical practice.*’ Others claim 
that the levels of retinopathy may not equate with the extent of systemic 
hypertension, like Hayreh. Some suggested, however, that classifica- 
tions could be associated with heart disease. In particular, recent work 
connects the revised Keith—-Wagener—Barker model defined by Mitchell 
and Wong to the target damage of the end-organ.'° The following are the 
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HR classification systems based on the retinal fundus image consideration 
with the help of ophthalmoscope as indicated in Tables 3.1 and-3.3. 


3.2.1.1 KEITH-WAGENER-BARKER CLASSIFICATION (1939) 


Based on their ophthalmoscopic findings, patients were grouped. This 
was, therefore, the first method to associate retinal results with the state of 
the hypertensive disease. The rankings are as follows: 


TABLE 3.1 The Keith-Wagener—Barker HR Classification Based on the Retinal Fundus 
Image Severity. 


Grade Classification Symptoms 


I-mild hypertension | Gentle summed up retinal arteriolar No symptoms 
narrowing or sclerosis 


II-more marked HR ~~ Unmistakable central narrowing and Asymptomatic 
arteriovenous intersections. Moderate to 
checked sclerosis of the retinal arterioles. 
Misrepresented blood vessel light reflex 


Il-mild angiosplastic Retinal HEM, EX, and CWS. Sclerosis _ Indicative 
retinopathy and spastic lesions of retinal arterioles 


IV Grade III + papilloedema (Severe) Compact existence 
Source: Adapted from Ref. [10]. 


TABLE 3.2 Mitchell—-Wong Classification of HR. 


Grade Classification 


I-mild retinopathy Arteriolar narrowing, AV scratching as well as arteriolar 
divider mistiness 


II-moderate retinopathy HEM, MA, CWS, and/or rough EX 
Il-malignant retinopathy | Grade-II + optical disc-(OD) swelling 
Source: Adapted from Ref. [10]. 


The HR severity is measured in terms of the arteries and veins ratio 
generally known as AVR ratio. The AVR ratio is calculated using Central 
Retinal Arterial Equivalent (CRAE) and Central Retina Venous Equivalent 
(CRVE) measurements. These measurements are described by formulas of 
Parr—Hubbar.*’ The “Arteriole” and “Venule” contain the mean widths of 
arteries and veins segments under Region of Interest. 
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TABLE 3.3 Scheie Classification of HR. 


Grade Scheie classification The Scheie classification based Modified Scheie 
on light reflex changes from classification 
arteriolosclerotic changes 


0 Determination of Normal No changes 
hypertension, however, 
no noticeable retinal 
variations from the norm 


1 Verbose arteriolar Expanding of light reflex with Barely detectable 
narrowing; no central negligible arteriovenous pressure arterial narrowing 
choking 

2 Increasingly articulated Light reflex changes Obvious arterial- 
arteriolar narrowing with and intersection changes narrowing with 
central choking progressively unmistakable focal irregularities 

3 Central and diffuse Copper-wire appearance; Grade-2 + retinal 
narrowing with retinal — progressively unmistakable HEM and/or EX 
HEM arteriovenous pressure 

4 Retinal edema, hard EX, Silver-wire appearance; severe NA 
OD edema arteriolovenous crossing changes 


Source: Adapted from Ref. [40]. 


CRAE = (0.8702 +1.01W, —0.22W,,W, -10.73 


Where Wb = median value of “Arteriole” and Wa = the value in the 
same list exactly before the median. 


CRVE = 3{(0.72W72 +0.91172 + 450.02 


As shown in the Figure 3.3, if the AVR ratio is in between 0.667 and 
0.75 then that retinal image is graded as normal. If the AVR ratio is 0.5, 
0.33, 0.25, and <0.20 then retinal images are graded as Grade-1, Grade-2, 
Grade-3, Grade-4, respectively.*® 

The differential diagnosis for HR with diffuse retinal HEM, CWS, 
and hard EX incorporates most outstandingly DR. DR can be recognized 
from HR by assessment for the individual fundamental diseases. Different 
conditions with diffuse retinal HEM that can take after HR incorporate 
radiation retinopathy, paleness, and other blood dyscrasias, visual ischemic 
disorder, and retinal vein impediment. 

The gradation and extent of hypertension are usually the key deter- 
minants of retinopathy with hypertension. The improvements mentioned 
in the sections above, however, are not specific for hypertension. Similar 
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improvements can be seen in many vascular-risk disorders, such as 
diabetes. Often, when diabetes and hypertension are involved, retinopathy 
may be more extreme and progressive. Certain causes, such as hyperlipid- 
emia, may also aggravate retinopathy.° 


~ Person with 
HR 


| Acquiring Fundus 
Images 


AVR Calculation 


{ | . 4 J | 
0.667-0.75 0.5 0.33 0.25 | <0.20 
Normal Grade-1 Grade-2 | Grade-3 Grade-4 


FIGURE 3.3 Grading of HR. 


Source: Reprinted with permission from Ref. [39]. 


Figure 3.4 shows the HR grades of typical advanced retinal fundus 
photos of mild (a, b), moderate (c, d), and malignant (e, f) HR, as reviewed 
with the improved characterization. (a) Mild-HR is demonstrated by the 
nearness of summed up arteriolar narrowing, arteriovenous scratching, 
and opacification of the arteriolar divider (“copper wiring’’). (b) Mild-HR 
with central arteriolar narrowing. (c and d) Moderate-HR with various 
retinal HEM and CWS. (e and f) Malignant-HR with the growing of the 
OD, retinal HEM, hard EX, and CWS. 


3.3. PERFORMANCE METRICS 


In HR, retinal images division execution measures are condensed, 
Accuracy (Acc), Sensitivity (Se), and Specificity (Sp) are the most often 
received measures. 
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FIGURE 3.4 HR grades. 


Source: Reprinted with permission from public database STARE, Open access https:// 
cecas.clemson.edu/~ahoover/stare/. 


The retinal vessel order depends on the accurately ordered vessel 
(TP-True positive) and non-vessel (TN-True negative), and mistakenly 
grouped vessel (FP-False positive) also, non-vessel (FN-False negative). 
TP distinguishes that pixel is a vessel in both the sectioned and ground truth 
picture; while in TN, the pixel is non-vessel in the sectioned and ground 
truth pictures. FP recognizes that pixel is a vessel in the fragmented picture 
yet non-vessel in eyewitness stamped picture, additionally in FN, the pixel 
is a vessel in ground truth while non-vessel in the sectioned picture. These 
terms are utilized to assess execution. 
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3.3.1 ACCURACY 


Acc is defined as the ratio of correctly identified pixels to the total number 
of pixels present in the image. 


TP + FN 
~ TP+TN+FP+FN 


Acc (3.1) 


3.3.2 SENSITIVITY AND SPECIFICITY 


Se and Sp are the factual proportions of the execution of a twofold grouping 
test in HR. As conveyed in eq 3.2, Se likewise alluded as TP rate, measures 
the extent of positives, both TP and FN, that are effectively recognized. 

As communicated in eq 3.3, Sp estimates the extent of negatives, both 
TN and FP that are accurately distinguished. Despite the fact that a high 
Se mirrors the attractive calculation tendency to recognize vessels, a high 
Se with low Sp shows that the division incorporates numerous pixels that 
do not have a place with vessels, for example, high FP. 


TP 


Se Tp a EN 4) 
TN 
SP = Twa FP G2) 


3.3.3 POSITIVE PREDICTED VALUE 


It is the capacity measure that the BV pixel identified as the BVs is really 
positive and it is expressed in eq 3.4. 


TP 
PPV = TP+FP (3.4) 


3.3.4 AREA UNDER CURVE 


The ratio between the true positive rate and false positive rate is considered 
as AUC and it is indicated in eq 3.5. 


rn if TP, IN 
GES) TP+FN TN+FP (3.5) 
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3.4 METHODS 


There are several methods or techniques available for HR grading. They 
are broadly classified into two types, that is, conventional approaches 
and the ML based approaches. The basic idea of these two approaches is 
presented in the following sections. 


3.4.1 CONVENTIONAL METHOD 


Fundus photography includes capturing the back of an eye, otherwise 
called the fundus. Specific fundus cameras comprising of a mind-boggling 
magnifying instrument appended to a flash permitted camera are utilized 
in fundus taking photographs. The primary structures that can be envi- 
sioned on a fundus are the focal and fringe retina, OD and macula. The 
general approach for HR grading using conventional approaches is shown 
in Figure 3.5. 


Blood Vessel 
Input Fundus Plane Bihencement ood esse | 
Image Separation Segmentation | 


HR Grading = AVR Ratio Aetna Year boas OD Detection 


| Classification | 


FIGURE 3.5 Conventional approach for HR grading. 


Fundus picture is a RGB shading picture, when all is said in done, 
RGB pictures comprise of three channels (red-green-blue) This can 
be sophisticated by detachment the retina picture to three channels and 
utilizing just one of them (Green channel), the blue channel is portrayed 
by low differentiation and doesn’t contain a lot of data. The vessels are 
obvious in the red channel.*? 

The information picture is resized and the Red or Green channel 
picture is isolated as the vein seems more brilliant in the Red?’ or on 
the other hand green channel picture. At that point, the morphological 
activity is performed on the Red or green channel picture. The essential 
morphological activities are dilation and erosion. The more unpredictable 
morphological activities are opening and closing. Dilation is an activity 
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that develops or thickens questions in a binary picture. The particular way 
and degree of this thickening are constrained by shape alluded to as an 
organizing component. Dilation is characterized as far as a set activity. 
Erosion shrivels or diminishes questions in a parallel picture. The way and 
degree of contracting is constrained by a structured component. 

The next step included is image enhancement. Image enhancement 
strategies are numerical methods that are planned for acknowledging the 
improvement in the nature of a given picture. The outcome is another 
picture that shows certain highlights in a way that is better in some 
sense when contrasted with their appearance in the first picture. One 
may likewise determine or register different handled forms of the first 
picture, each introducing a chose to highlight in an upgraded appearance. 
Straightforward picture improvement systems are created and applied 
in an impromptu way. Propelled systems that are advanced concerning 
certain particular prerequisites and target criteria are likewise accessible. 

Some of the image enhancement techniques are as mentioned below: 


Contrast limited adaptive histogram equalization (CLAHE) 
Decorrelation stretch. 


1. Filtering with morphological operators. 
2. Histogram equalization. 

3. Noise removal using a Wiener filter. 

4. Linear contrast adjustment. 

5. Median filtering. 

6. Unsharp mask filtering. 

ce 

8. 


Adaptive histogram equalization (AHE) is a PC picture getting ready 
method used to improve separate in pictures. It changes from customary 
histogram alteration in the respect that the flexible procedure enrolls a 
couple of histograms, each identifying with an unquestionable section of the 
image, and uses them to redistribute the delicacy estimations of the image. 
Itis in this manner fitting for improving the area to separate and redesigning 
the implications of edges in each region of an image. Regardless, AHE 
tends to over amplify commotion in by and large homogeneous zones 
of an image. A variety of flexible histogram equalization called CLAHE 
maintains a strategic distance from this by limiting the improvement. 

The BVs are the essential anatomical structure that can be unmistakable 
in retinal pictures. The division of retinal veins has been acknowledged 
worldwide for the conclusion of both cardiovascular (CVD) and retinal 
infections. In this manner, it requires a fitting vessel division technique 
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for the programmed discovery of retinal ailments, for example, diabetic 
retinopathy and waterfall. The identification of retinal infections utilizing 
PC supported conclusions can help individuals to keep away from the 
dangers of visual disability furthermore, spare restorative assets.** 

The preparing of the retinal fundus picture is the starter step for the 
vessel division task. It includes various advances, for example catching a 
photograph of the eye containing vessel, vessel upgrade, expelling commo- 
tion and assessing the exhibition utilizing various measures, and so forth. 

Utilizing typical division procedures, we can distinguish just the veins. 
Along these lines, shading picture division is the most ideal approach to 
distinguish the retinal issues, since utilizing shading picture division we 
can separate the retinal veins and arteries. 

Color fundus picture vein extraction is principally continued to 
separate the arteries and veins. Retinal vascular organize extraction can 
be completed by utilizing exceptionally high goals fundus color pictures. 
It has a few inconveniences, for example, an impression of focal light, 
antiquity present in the information retinal picture the proposed framework 
for shading retinal BV division comprises of a mix of morphological 
procedures to recognize veins. 

Some of the BV segmentation methods” are shown in Figure 3.6. 


BV Segmentation 


Supervised Methods Unsupervised Methods 
SVM | ANN } Matched Mathematical Model Based Vessel Tracking 


Filtering Morphology Methods Methods 


FIGURE 3.6 BV Segmentation methods. 
Source: Reprinted with permission from Ref. [22]. © 2020 Springer. 


The next step involved is optic disc detection. For retinal images, the 
optic disk is a central anatomical structure. The ability to detect optical 
disks for retinal images plays a major role in automatic screening systems. 
The next step followed by the OD detection is retinal vessel classification. 
Based on the thresholding values carried out by the pixels retinal vessels 
are classified into arteries or veins. As mentioned in the above sections the 
AVR ratio is calculated based on which the HR grading will be done. 


Machine Learning Algorithms for Hypertensive Retinopathy 53 


3.4.2 MACHINE LEARNING METHODS 


Classification of retinal fundus images has become one of the main uses 
of the pilot to illustrate ML. Convolutional neural networks (CNNs) 
are a kind of deep neural networks (DNN) that generate fairly accurate 
results when used to classify retinal fundus images.”?*>*°*?* The general 
approach for grading HR using ML is shown in Figure 3.7. 


atin | 
Input Fundus Pea Process | Blood Vessel | F eature 
Image | | Segmentation Extraction 


JON 
T Blood Vessel Trained ae 
is Pre Processing sai ee aah CNN Training 
Image _/ Segmentation Model 


-—_——_" 


| HR Grading 
_————————— 


FIGURE 3.7 Machine learning approach for HR grading. 


Retinal pictures incorporate pertinent data to HR just as uproarious 
and unimportant pixels. The evacuation of undesirable pixels is called 
preprocessing. 

The preprocessing of the picture is the key idea for the better division 
procedure of the retinal vessel before the classifier preparing and testing 
stage. This progression may include various advances and systems relying 
upon the necessities of the classifier. This is normally performed for clamor 
decrease, vessel improvement, and exception cancellation, and so forth. 

Some of the preprocessing techniques are as follows: 


i. Transforming an image 

ii. Extracting green channel image 
iti. Morphological operations 

iv. Knowledge base processing 

v. Enhancing the image 

vi. Filtering 

vii. BV segmentation 

viii. Extracting features 

ix. Selecting features 

x. Restoring the images 
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The steps involved next to preprocessing are mention in the earlier 
topics. After selecting the desired features from the preprocessed retinal 
fundus image the CNN model can be trained. 

A CNN is a particular kind of counterfeit neural system that 
utilizations perceptrons, an AI unit calculation, for regulated learning, to 
dissect information. CNN’s apply to picture handling, regular language 
preparing and different sorts of psychological errands. A CNN is 
otherwise called a “ConvNet.” Like different sorts of counterfeit neural 
systems, a CNN has an info layer, a yield layer, and different concealed 
layers. A portion of these layers is convolutional, utilizing a numerical 
model to give results to progressive layers. This recreates a portion of 
the activities in the human visual cortex. CNNs are an essential case 
of profound realizing, where an increasingly refined model pushes the 
advancement of computerized reasoning by offering frameworks that 
reenact various sorts of natural human mind action. The components of 
a CNN are as follows: 


3.4.2.1 CONVOLUTIONAL LAYER 


Convolution layer is an important layer to extract the features from 
the given image which is carrying information. Convolution keeps the 
relationship between the pixels by extracting features from the image with 
the help of square matrices of the given data. It is a scientific activity that 
takes two information sources, for example, pixels arranged in a matrix 
and a kernel or part. Based on the filter, we apply on an image, we can find 
out the borders and we can increase or decrease the quality of the image 
with certain operations. 

The convolution of the pixels values of an image multiplied with filter 
matrix is called “Feature Map.” Let an image matrix is having a dimension 
of h x w x d and the filter is f, x f, x d then the output dimensions of the 
image are (h—f, + 1) x (w—f,,+ 1) x 1. The matrix of the image multiplied 
with the filter is shown in Figure 3.8. 


3.4.2.2 STRIDE 


The number of pixels that moves over the given input image matrix is 
called as stride. For example, if the stride is | then we move the kernel to | 
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pixel and if the stride is 2 then we move the kernel to 2 pixels on the given 
input image matrix. 


=h-f,+1 


w w-f,+1 
FIGURE 3.8 Image pixels matrix multiplied with kernel or filter matrix. 


Source: Reprinted from Ref. [3]. Open access. 


3.4.2.3 PADDING 


In certain situations, the kernel may not be fitted with the given image 
pixels matrix. In such situations, we are having two choices 


i. We can add zeros to the input image matrix so that the kernel or 
filter fits 

ii. Wecan drop or eliminate a part of input image where the kernel or 
filter fits 


Rectified Linear Unit for a non-linear operation-ReLU. 

Sometimes, there may be a chance to have negative values in the given 
matrices. To provide non-linearity in ConvNet the ReLU operations will 
be useful in providing non-negative linear values. 


The output of the ReLU is f(x)=max(0,x) 


3.4.2.4 POOLING 


The principal mystery ingredient that has made CNNs exceptionally 
successful is pooling. Pooling is a vector to scalar change that works on 
every nearby area of a picture, much the same as convolutions do, be that 
as it may, in contrast to convolutions, they do not have channels and do 
not figure dab items with the neighborhood locale, rather, they process the 
normal of the pixels in the district (Average Pooling) or just picks the pixel 
with the most noteworthy power and disposes of the rest (Max Pooling). 
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3.4.2.5 DROPOUTS 


Overfitting is a marvel whereby a system functions admirably on the 
preparation set yet performs inadequately on the test set. This is frequently 
because of inordinate reliance on the nearness of explicit highlights in the 
preparation set. Dropout is a strategy for battling over-fitting. It works 
by haphazardly setting a few initiations to 0, basically executing them. 
By doing this, the system is compelled to investigate more methods for 
arranging the pictures rather than over-contingent upon certain highlights. 
This was one of the key components in the AlexNet. 


3.4.2.6 BATCH NORMALIZATION 


A significant issue with neural systems is evaporating gradients. 
This is a circumstance whereby the inclinations become excessively 
little, consequently, preparing surfers frightfully. Ioffe and Szegedy 
from Google Brain found this was to a great extent because of the 
inside covariate move, a circumstance that emerges from the change 
information appropriation as data moves through the system. What they 
did was to gadget the method known as bunch standardization. This 
works by normalizing each group of the picture to have zero mean and 
unit difference. It is generally set before non-linearity (relu) in CNNs. 
It significantly improves exactness while fantastically accelerating the 
preparation procedure. 


3.4.2.7. DATA AUGMENTATION 


The last fixing required or present-day covnets is information increase. 
The human vision framework is amazing at adjusting to picture 
interpretations, pivots, and different types of mutilations. Take a picture 
and flip it, at any rate, a great many people can, in any case, remember it. 
In any case, covnets are not truly adept at taking care of such contortions; 
they could bomb frightfully because of minor interpretations. The way 
to settling this is to haphazardly misshape the preparation pictures, 
utilizing flat flipping, vertical flipping, pivot, brightening, moving, 
and different twists. This would empower covnets to figure out how to 
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deal with these contortions, henceforth, they would have the option to 
function admirably in reality. Another basic method is to subtract the 
mean picture from each picture and furthermore isolate it by the standard 
deviation. 


3.4.2.8 FULLY CONNECTED LAYER 


It is also known as the FC layer. Matrix has to be leveled into a vector 
and feed it into a FC layer like a neural system. In this, the element map 
framework will be altered over as a vector. 

With the FC layers, we can join the highlights together to make a 
model. At long last, we have an initiation capacity, for example, softmax 
or sigmoid to arrange the yields into various classes. The neural networks 
with multiple convolutional layers and the complete CNN architectures 
are as shown in the Figures 3.9 and 3.10, respectively. 


Oo Oo — BICYCLE 


FULLY 
CONVOLUTION + RELU = POOLING CONVOLUTION + RELU POOLING FLATTEN CONNECTED SOFTMAX 
FEATURE LEARNING CLASSIFICATION 


FIGURE 3.9 Neural network with more than one convolutional layers. 


Source: Reprinted from Ref. [3]. Open access. 
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FIGURE 3.10 Complete CNN architecture. 


Source: Reprinted from Ref. [3]. Open access. 
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3.5 DATABASE 


There are various open retinal datasets accessible with BV subtleties. It is 
the key advance for the vein division to prepare and test the classifier on 
the retinal database. A few databases, for example, DRIVE and STARE 
and so on are publically accessible for the specialists alongside the ground 
truth pictures of the vessels. The exhibition of the classifier can be assessed 
utilizing these datasets. 


3.5.1 DIGITAL RETINAL IMAGES FOR VESSEL EXTRACTION-DRIVE 


DRIVE is one of the normally utilized datasets for retinal BV division.” 
DRIVE comprises of 40 retinal pictures in which 33 are more beneficial 
pictures while 7 have given indications of gentle diabetic retinopathy. 
Group CR5 non-mydriatic camera with 45° field of vi (FOV) and eight 
pieces for every shading channel at 768 x 584 pixels have been utilized 
to catch the pictures in the JPEG position. Each picture has a round FOV 
with 540 pixels’ distance across. DRIVE dataset has been isolated into 
preparing and test set with 20 pictures each. In the preparation set, 14 
pictures were fragmented by the first master and 6 pictures were divided 
constantly. In the test set, the division has been performed twice in two 
cases. In case 1, first and the second master divided 13 and 7 pictures 
individually; while the case 2 has been performed by the third master. In 
case | and case 2, the spectators stamped 12.7 and 12.3% pixels as vessels 
individually. 


3.5.2. STRUCTURED ANALYSIS OF THE RETINA-STARE 


This dataset comprises of 400 retinal pictures, caught utilizing TOPCON 
TRV-50 fundus camera with extra settings of 35° FOV and 8 bits/shading 
channel at 605 x 700 pixels. The normal width of the FOV is 650 x 700. 
Gaze has 20 vessel ground truth pictures utilized for vein division in which 
9 are more beneficial while the rest of them have indicated various kinds of 
retinal maladies.*! Two specialists have physically sectioned these pictures 
where the main master portioned 10.4% vessel pixel, while the subsequent 
master sectioned 14.9% of the more slender vessel. By and large, the divi- 
sion of the main spectator used to figure the exhibition as the ground truth.** 
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3.5.3 ANNOTATED DATASET FOR VESSEL SEGMENTATION AND 
CALCULATION OF ARTERIOVENOUS RATIO-AVRDB 


AVRDB is a recently created HR database that will be freely accessible at 
www.biomisa.org in future for the consideration network. It is having 100 
fundus retinal pictures that are caught through TOPCON TRC-NW8 and 
explained with the assistance of master ophthalmologists from the Armed 
Forces Institute of Ophthalmology. The vascular system is sorted into an 
arteriolar and venular design. The 100 pictures are having a measurements 
of 1504 x 1000 comprise retinal courses, veins, AVR, and entire vascular 
structure for ground certainties. It likewise has an explanation at the 
picture level for HR.” 


3.5.4 VICAVR 


This dataset comprises of 58 retinal pictures. The dataset was utilized 
to register the supply route/vein proportion and the pictures are caught 
utilizing NW-100 Top Canon mydriatic camera with a focused optic plate 
and 768 x 584 pixels’ goals. The database contains the subtleties of the 
vessel estimated from the optic plate at various radii alongside the sort 
of vessel (A/V proportion). The ground truth subtleties were watched by 
three picture examination specialists.’ 


3.5.5 INSPIRE AVR 


INSPIRE AVR with 40 shading pictures of the vessels and optic circle 
and an arterial—venous proportion reference standard. The orientation 
standard is the normal of the appraisal of two specialists utilizing IVAN 
(a semi-mechanized PC program created by the University of Wisconsin, 
Madison, WI, USA) on the pictures.”° 

The retinal fundus image databases with the number of images available 
for HR classification are mentioned in Table 3.4. 


3.6 PROPOSED METHOD 


The retinal database of VICAVR (Fig. 3.12a—c) and STARE (Fig. 3.12d—f) 
is used in the method. The Figure 3.11 indicates the steps of the method 
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proposed. The method takes the retinal images as input in the first step 
represented in Figure 3.12(i). Then the green channel of the fundus image 
is extracted. The next step is to enhance the retinal image using CLAHE. 
The next step is to localize the OD using morphological operations. Then 
it is to segment the BV and classifying them as arteries and veins. In the 
final step based on the ratio of AVR ratio, the HR classification is done. 


TABLE 3.4 Retinal Database with Number of Fundus Images Available for HR. 


SI. No Database Total images available 
1 DRIVE 40 

2 STARE 400 

3 AVRDB 100 

4 VICAVR 58 

a) INSPIRE AVR 40 


Input Fundus on Green Channel OD Detection & 
Image Extraction Localization 
; A ye ne Vessel 
HR Detection See AVR Calculation Vessel Classification : 
Segmentation 


FIGURE 3.11 Proposed method. 


3.6.1 GREEN CHANNEL 


It is the second step of HR detection. The fundus image’s green band is 
isolated because the contrast between the green channel BV’s and the 
red and blue channels is more contrasted. The difference in intensity in 
the background is smaller in green plane of the fundus image. The green 
channel image is shown in Figure 3.12(ii). 


3.6.2 CONTRAST LIMITED ADAPTIVE HISTOGRAM 
EQUALIZATION-CLAHE 


CLAHE is often used in improving retinal image with low contrast. A 
transformation function per neighborhood pixel, derived from a minimal 
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contrast procedure. CLAHE was introduced primarily to avoid the noise 
over amplification. The enhanced image is shown in Figure 3.12(iii). 


3.6.3 OPTIC DISC LOCALIZATION 


This section describes one of the major steps in diagnosing HR is to locate 
the OD. The nerves enter and leave the retina to the brain and travel through 
the OD from the brain to the retina. Thus the OD functions as an entry 
mark and a mark remains. The localization of OD is done by calculating 
the maximum intensity level of the average filtered fundus image and the 
region of interest is taken as four times the radius of the OD. The localized 
OD is shown in Figure 3.12(iv). 


3.6.4 VESSEL SEGMENTATION AND CLASSIFICATION 


OD localization helps in segmenting and separating the arteries and veins. 
Supervised and unsupervised techniques for classifying BV’s are available. 
All of the supervised techniques implemented the pixel dependent 
classification. Neural networks and support vector machines are the main 
supervised strategies. The segmented BV’s are shown in Figure 3.12(v). The 
approach mentioned uses the vessel classification neural network which is 
the supervised method. The adopted approach first trains the collection of 
STARE, VICAVR database training images and then checks the images 
to classify vessels as arteries or veins. The next step is to measure the 
arterial and vein widths. To measure the distance, take the counterpart of 
separate arteries and veins. By having the complement, it converts 0 pixel 
into 1 and 1 pixel into 0. Next calculate the distance by using the distance 
transform and morphological thinning method to calculate the approximate 
or accurate distance from each binary pixel of images to the nearest zero 
pixel. The gap would naturally be zero for null image pixels. 


3.6.5 AVR CALCULATION AND HR DETECTION 


The final step is to calculate the AVR ratio which is mentioned in the 
above sections. Based on the AVR ratio obtained the classification of HR 
is done which is mentioned in the Figure 3.3. 
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FIGURE 3.12 Output of the proposed method. 


3.7 COMPARATIVE ANALYSIS 


This section will give the Acc obtained for various algorithms for various 
databases. From the observation, it is found that Abbasi et al. used 
conventional approach for HR detection on locally available database and 
obtained a low Acc of 81%. Whereas Irshad et al. got very good results of 
98.65% with the conventional methods using VICAVR database. Using 
ML approach for grading the HR Syahputra et al. achieved a highest Acc of 
100% using a testing sample of 20 images from STARE database. Authors 
have used only one type of database for HR detection; but in the proposed 
method, VICAVR and STARE databases are used for HR detection. The 
Acc for HR grading using various algorithms is listed in Table 3.5 along 
with the database used. 


3.8 CONCLUSION AND FUTURE WORK 


Hypertensive eye ailment is distinguished by a robotized procedure from the 
retinal vein pictures. The veins are removed utilizing different techniques. 
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The talked about strategies pursue different strides from pre-handling to 
the division of veins and discovering AVR proportion. Each progression 
performed well for the improvement of the outcomes. The productivity of 
different calculations proposed is investigated by contrasting them and the 
techniques and datasets utilized. 


TABLE 3.5 Comparison of Acc of Various Algorithms for HR Grading. 


Author Algorithms or Database Acc (%) 
performance metrics 
Ortiz et al., 2010 [27] Morphological operations Hospital 82 
Universitario San 
Manikis et al., 2011 [19] AVR ratio DRIVE 93.71 
Ortiz et al., 2012 [28] Morphological operations Hospital 82 
Universitario 
Irshad and Akram, 2014 [12] AVR ratio AVRDB 81.3 
Khitran et al., 2014 [16] Hybrid classifier DRIVE 98 
Abbasi and Akram, 2014 [1] AVR ratio Local data 81 
Irshadet al., 2016 [13] AVR ratio SVM VICAVR 98.65 
Syahputra et al., 2017 [33] Probabilistic neural STARE 100 
networks 
Ahmad et al., 2018 [3] AVR ratio AVRDB 89.4 
Savant and Shenvi, 2019 [30] AVR ratio DRIVE 86.67 
Kiruthika et al., 2019 [17] Radon vessel tracking DRIVE 92.55 
algorithm 
Triwijoyo et al., 2017 [34] CNN DRIVE 98.6 
Akbar et al., 2018 [4] AVR ratio INSPIRE AVR 97.50 
Akbar et al., 2018 [5] AVR ratio INSPIRE AVR 98.76 
Proposed method AVR ratio VICAVR 98.5 
STARE 98.3 


Many automatic techniques are accessible for HR discovery yet there 
is a requirement for such a framework which considers total fundus 
picture for programed HR location and evaluating. In this manner, this 
similar examination will be creative in automatic supported demonstrative 
arrangement of HR location and evaluating. 
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ABSTRACT 


Big Image Data Processing (BIDP) refers to the processing of images 
that are huge in terms of quantity, individual dimension, and individual 
size with respect to memory. This chapter elaborates on methods to deal 
with the three above-mentioned categories of images. In these scenarios, 
the data can be stored using a Distributed File System. To work with 
this amount of data, different programing paradigms can be used such 
as Hadoop’s MapReduce, Matlab’s MapReduce, and “Hadoop-Matlab” 
integrated environment with MapReduce Programing. The authors formed 
a Hadoop cluster with 116 systems and processed 1.2 TB of text data for 
word count task. The authors have also performed image retrieval on Corel 
1000, Corel 10,000, Brodatz Textures, Mirflickr and ImageNet datasets 
effectively with this cluster configuration. The authors have created and 
processed a 32768 x 32768 dimension image and a 3.14 GB image using 
the MapReduce paradigm. Different applications using these technologies 
and methods are image retrieval and object detection, which can be used 
in a multiresolution environment as well. 
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4.1 INTRODUCTION 
4.1.1 IMAGE AND VIDEO PROCESSING AND APPLICATIONS 


An image is made up of picture elements, called pixels or image elements. 
Each numeric value of pixel represents its intensity and can be represented 
by spatial coordinates with x and y on the x-axis and y-axis, respectively. An 
image can be in binary, grayscale, or color format. Digital image processing 
implies processing with an image on a digital computer.' Digital video is 
acquired by a time sequence of two-dimensional spatial intensity arrays.’ 
In simple words, images or frames are displayed at a rate of 72 frames 
per second to make it look continuous. Image and video processing has 
their application in many fields and its demand is growing with the thriving 
technology. Some of its applications are stated in the proceeding section. 

One of the applications of image processing is in restoration.’ It is used 
for modification of images to remove noise and improve the image quality, 
this, in turn, is advantageous for detection and retrieval applications. The 
medical field has numerous applications wherein image processing is used 
such as X-rays* and CT scans.° Image and video processing are used in 
object detection. 

Object detection not only classifies, but also gives a precise location 
of the object in each image or frame of a video. At the same time, 
object detection is a fundamental problem of computer vision that 
has applications in image classification,° human behavior analysis’, 
and autonomous driving.* In computer vision, many robotic machines 
perform image-based tasks.’ Robots employ image processing methods to 
track ways such as line follower robot and detection of a hurdle. Another 
application is pattern recognition where it is combined with artificial 
intelligence for recognition,'° modeling and segmentation.'' In the field 
of video processing, video surveillance contributes hugely to Big Video 
data. Monitoring surveillance videos is very crucial for protection and 
security in many metropolitan cities.'? Big Video processing requires 
efficient video compression and transmission to enable smart cities 
with Internet of Things technique which further helps in monitoring 
human activity information.'* Video tracking for suspicious vehicles, 
movements, etc., is an application of video processing in a large 
scale.'* Another area where Big Video is applied is in transportation 
management. The development of an intelligent transportation system 
requires processing in a Big Data environment.'° Managing the transport 
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system gives rise to issues like monitoring vehicle density and traffic 
congestion wherein a large quantity of videos needs to be processed 
using Big Data mechanisms.'° 


4.1.2. BIG DATA 


Big Data is a means of describing data problems that cannot be solved by 
traditional tools. For better comprehension of Big Data problems, initially 
in the early 2000s, it is accepted that Big Data can be characterized by 
three Vs, that is, Volume, Variety, and Velocity. But Big Data goes beyond 
these three Vs. To prepare for the advantages and challenges of Big Data 
initiatives, it is characterized by seven more Vs.'”"'° All these ten Vs are 
explained in this section. 

Volume: Volume is the quantity of data that we have. With an increase 
in the number of new technologies and devices, there is an exponential 
growth in data. These data can be extremely valuable if it can be utilized in 
a proper manner. About 90% of all data ever created was generated in the 
past 2 years. Therefore, the scale is what makes Big Data big. 

Variety: One of the biggest challenges faced by Big Data is Variety. 
Most of the data generated are unstructured which includes various types 
of data from XML data to tweets, photos, and videos. Organization of this 
data in a semantic way is difficult as the data itself is rapidly changing. 

Velocity: Velocity denotes the speed at which data creation increases 
and the speed at which relational databases can store, process, and analyze 
data. The promises of real-time data processing attract interest as it allows 
companies to achieve tasks such as displaying personalized advertise- 
ments on the websites visited in accordance with a person’s recent history 
of search, viewing, and purchase. 

Veracity: Veracity states to make the data accurate. The value of Big 
Data ceases if it is not accurate, which requires discarding the noise before 
beginning analysis. The simplest example is the contacts that enter your 
marketing automation system with false names and inaccurate contact 
information. 

Value: Big Data has a huge potential value. Even though, discarding 
poor data’s cost is also huge. Because data are actually worthless unless it 
is analyzed to get accurate data and information provided by it. 

Visualization: Once the data have been processed, it needs to be 
presented in an accessible and readable manner. Visualization can contain 
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loads of parameters and variables. This has become one of the challenges 
of Big Data. 

Variability: Variability differs from variety. A restaurant may have 20 
different kinds of food items on the menu. However, if the same item 
from the menu tastes different each day, then it is called variability. The 
same applies to data, whenever the meaning of a data changes constantly 
it affects the homogeneous nature of data. Variability indicates data whose 
meaning constantly changes. 

Vulnerability: With a huge amount of data, there also arise concerns 
about security. A data breach on Big Data can cause an exploitation of 
important information. Many hackers have attempted and succeeded in 
many Big Data breaches. 

Volatility: Before the advent of Big Data, data were stored indefinitely. 
But due to the volume and velocity of Big Data, volatility needs to be 
considered. It needs to be established that how long data should be stored 
and when to consider that data have become irrelevant or historic. 

Validity: Validity refers to how accurate and correct the data is for its 
intended use. Benefits from Big Data can be derived if the underlying data 
are consistent in quality, metadata, and common definitions. 


4.1.3, BIG IMAGE DATA PROCESSING 


The demand for processing an enormous number of images, images of large 
dimension and images big in size made the authors explore the new tech- 
nologies, which can accomplish this.”°”! In this process, “Big Image/Video 
Data Processing” has evolved as shown in Figure 4.1. The relation between 
Big Data and Image Processing is shown in Figure 4.2. Big Image/Video 
data processing has solved many technological challenges which including 
storage, compression, analysis, transmission, and recognition.””* Big 
Image/Video Data processing plays an important role in fulfilling modern- 
day technical demands such as intelligent transport system,’ big image 
classification and retrieval,”° human behavior monitoring.”’ 


4.1.4 CATEGORIES OF BIG IMAGE DATA PROCESSING 


The general perception of Big Image Data Processing is that it deals with the 
processing of images that are huge in quantity. However, Big Image implies 
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(1) images which are large in quantity, shown in Figure 4.3. (2) individual 
image big with respect to Dimension (M x N) as shown in Table 4.1 and 
(3) individual images big with respect to Size, that is, amount of storage 
required to store it, as shown in Table 4.2. 


Image/Video Processing Big Data 


Big ‘Image/Video ’ Data Processing | 


FIGURE 4.1 Evolving of “Big Image Data Processing.” 


FIGURE 4.2 Relationship between big data and image processing. 


In this chapter, the authors have given details about how to handle these 
three types of “big image” to store and process in a distributed environment. 

In this chapter, the authors have given different methods, technologies 
and implementation issues that they have experienced in making BIDP 
success. 


¢ The objectives of this chapter are: 
¢ To give different methods of handling Big Image Data. 
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¢ To discuss different technologies that can be used for processing 
the Big Image Data. 

¢ To share the experience with respect to the implementation. 

¢ To give the applications of Big Image Data Processing. 


FIGURE 4.3 A large number of images. 


4.2 BACKGROUND 


The number of images or videos to be processed is not just huge in 
quantity but also has enormous size and dimension. Therefore, given the 
existing technologies and environment, it is not possible to process this 
data without compromising on time. In the following section, the authors 
have presented a scenario leading to motivation for Big Image/Video data 
and methods to process it. 
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TABLE 4.1 Examples of Dimension Based Huge Images. 


Image Name Image Dimension 


Galaxy Image 1.5 billion pixel 


NewYork city 203200 10160020 
gig pixels 

Sky 10000050000 
pixels 

Tokyo Tower © lciyo lower Geqnrirei Paromra ©® 45 gig pixels 

Roppongi Hills Mori 150 gig pixels 


Tower. 
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TABLE 4.2 Examples of Size Based Huge Images. 


Image Name Image Size 
Galaxy Image 4.3 GB 
The Garden of Earthly Delights by 5.7 GB 
Bosch 

Louise Elisabeth Vigee-Lebrun- 2.7 GB 
Marie-Antoinette-Google Art Project 

Hans Holbein the Younger - The 3.0 GB 


Ambassadors - Google Art Project 
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Video Assignments: Today, students are able to use electronic devices 
without any difficulty. Therefore, the submission of assignments in video 
format can be done comfortably by them. Raju et al.”® proposed a new 
technique to review the assignments submitted by the students. They are 
required to record a video of themselves explaining the given problem.” 
This process helps the students to improve their learning capabilities. 
This can also prove useful for science students pursuing under graduation 
where they are required to learn theorems and construe proofs. Another 
benefit is that it facilitates teachers to raise discussion topics that allow 
students to work out their topics. Some of the video assignments submitted 
by students are shown in Figure 4.4. The problem that the authors have 
faced is some of the students, by using video cutting software, copied it 
from their friends. 

To handle a large number (~300) of videos to find video plagiarism which 
is a tedious and time-consuming process.*° So the authors have thought 
of processing these videos with the help of state-of-the-art technology: 
Hadoop with MapReduce paradigm. 

IEEE BigMM Conference: The conference IEEE BigMM7°! which has 
started in 2015 is also another motivating factor by the authors. BigMM 
stands for Big Multimedia. The target of the conference is to invite papers, 
which are in the domain of Multimedia data satisfying the characteristics 
of Big Data. 

Surveillance Videos: Applications of Big Data consume a lot of space 
in the research area and industry. Video streams coming from CCTV 
cameras is one of the main contributing and important cause among other 
sources of Big Data. Surveillance videos highly contribute to unstructured 
Big Data. CCTV cameras are installed in many places having demands 
for security. Surveillance ability and improved security are not possible 
without technology. Many technical innovations have come into existence, 
such as access control devices, video surveillance, and alarms. In a survey, 
it was found that all the respondents have either a system of video surveil- 
lance installed which is 95% or are planning to install the system in the 
next | year which is 5%. One respondent has reported the largest number 
of cameras, totaling up to 25000. Certainly, in each network, there has 
been an increase of almost 70% of the average number of cameras. The 
year from 2015 to 2018 saw an increase from about 2900 to 4900 cameras. 
The newest survey suggests reports that 20% of respondents have 10000 
or more cameras, whereas just 5% of them had in the previous survey.” 
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4.2.1 EXISTING TECHNOLOGIES: 


The main concerns while dealing with processing of Big Image Data are (1) 
storage of the given data when it cannot be stored in the existing infrastruc- 
ture and (2) processing of the given data when it cannot be processed with 
the existing infrastructure. In some cases, both can be done with the existing 
infrastructure, but it is a very time-consuming process. So to deal with this, 
the authors have discussed different existing technologies in this section. 


A. Hadoop: Hadoop is a part of the Apache project.*? It is an open- 
source Java-based framework used for storage and processing of 
Big Data in a distributed environment. 


Storage: Hadoop mainly contains two parts. One for storage 
and another for processing. For storing the data, it uses a file 
system known as Hadoop Distributed File System (HDFS). 
If the amount of data cannot fit into the memory of a single 
computer, a Hadoop cluster can be made with n number of 
computers, which gives combined storage. The total storage 
that can be contributed by all the computers in the cluster is 
termed as HDFS. In this scenario, all the computers which are 
a part of the cluster can access the data. 

Processing: As the data are stored in a distributed file system, 
a different programing paradigm is needed to process these 
data. So, Hadoop uses the MapReduce programing paradigm 
for it. When dealing with large data, the MapReduce paradigm 
is one of the best solutions to get the results in less time than 
that on doing it on a single system. This is a programing para- 
digm in which the execution takes place where the data reside. 
The execution takes place in three stages: Map, Shuffle & Sort 
and Reduce stages. The Map stage takes in the input in <Key, 
Value> pair and produces the output also as <Key, Value> pair. 
Then the Shuffle & Sort stage will sort this based on the “key”. 
Therefore, the reducer will consolidate the work for each of 
the key and produce the final output. For storing the data in 
intermediate steps Distributed File System can be used. This 
data can be in any form: Text, Images, Videos, Log Data, etc. 


B. MATLAB with Matlab Distributed Computing Server (MDCS): The 
MATLAB Distributed Computing Server (MDCS) allows users to 
submit (from within MATLAB) sequential or parallel MATLAB 
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code to a cluster. Before working on a distributed environment, 
MDCS should be installed on all the systems of the cluster. The 
authors have used a 96 workers (or cores) MDCS setup in their 
lab where it is installed on 24 system cluster where each system 
is of 4 cores. The parallel computing toolbox which contains 
“parpool” is installed on the head node which is one among the 
24 system cluster. So, one of the 24 systems is considered as both 
head node and client node. “parpool” can be used for the execution 
in different cores of a system. The “parpool” can be used in two 
different modes: (1) on local mode, a single system with available 
cores on it (2) by using the cores from all the systems in the cluster 
of systems (gcp). The integration of Matlab with Hadoop is done 
so that the data can be read from HDFS by parpool.* 

Spark: Apache Spark is a distributed cluster-computing general- 
purpose open-source framework. Spark offers an interface for 
execution in the whole cluster facilitating fault tolerance and data 
parallelism. Spark was developed at the University of California, 
Berkeley’s AMPLab. Later, the Spark codebase was given to 
the Apache Software Foundation. Resilient Distributed Dataset 
(RDD) which is a read-only distribution of multiset data items 
over a cluster of computers is the architectural foundation of 
Spark.*° When Hadoop’s MapReduce is used with multiple jobs to 
complete the given task, the intermediate results after completion 
of every job are going to be stored in HDFS. Reading the data from 
HDFS for the next job is a time-consuming process. However, in 
Spark, the intermediate results are stored in memory, so reading 
the data for the next job saves a lot of time. This, in turn, becomes 
faster by several orders of magnitude compared to Apache Hadoop 
MapReduce implementation.***’ As Spark provides a faster envi- 
ronment, implementation of iterative algorithms, which accesses 
their dataset in a loop for multiple times and interactive data 
analysis, which has repeated querying of data in the database is 
facilitated. Among the group of iterative algorithms are the training 
algorithms of the machine learning systems, which gave the initial 
motivation for the development of Apache Spark.** 

To process Big Image Data, the above-given technologies can 
be used individually or in combination. Sarmad Istephan et al.*’ 
proposed a method to retrieve an image from unstructured medical 
image Big Data with a case study on epilepsy. They have used two 
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types of criteria to validate the feasibility of the proposed framework: 
accuracy and ability. The accuracy is tested by executing the query 
on data that contains both structured and unstructured data. To test 
the ability of the framework, the results are compared by executing 
the query on different sized Hadoop clusters. The same kind of 
ability is tested in*? also. One novel CBIR framework was proposed 
by Lan Zhang et al.,*! known as PIC, where cloud computing is used 
for searching an image from a large image dataset while securing 
the privacy of input data. Here to deal with massive images, 
they have designed a system suitable for distributed and parallel 
computation to expedite the search process. Le Dong” proposed 
an effective processing framework named Image Cloud Processing 
(ICP) to deal with data explosion in the image processing field. The 
ICP framework consists of two mechanisms: Static ICP (SICP) and 
Dynamic ICP (DICP), where SICP is designed to cooperate with 
MapReduce paradigm and DICP implemented through a parallel 
processing procedure works with the traditional processing mecha- 
nism of the distributed system. To validate the ICP framework, 
they have used the ImageNet dataset. Jiachen Yang***’ have used 
maximal mutual information criterion to reduce the feature vector 
dimension to decrease the retrieval time. 


4.3. MAIN FOCUS OF THE CHAPTER 


The objectives of this section are to discuss different methods to handle 
any of these categories with respect to Big Images. 


¢ A large number of Images (SequenceFile). 

¢ A single large image with higher dimensions (Make into small 
pieces with respect to dimension and do MapReduce on them). 

¢« Asingle large image with huge memory (Make into small pieces 
with respect to memory and do MapReduce on them). 


4.3.1 METHODS FOR PROCESSING BIG IMAGE DATA 


To achieve the above-mentioned objectives with respect to Big Image 
Data (BID), the MapReduce programing paradigm is used in three 
different ways: local, Hadoop cluster, and Matlab’s Parallel Pool. To 
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work with this, the different options that can be used are: 0, cluster, and 
gcp (get current parallel pool), respectively, in the implementation code 
as cluster setup. When “0” is used, the data will be taken from the local 
system and Matlab’s MapReduce will do the entire job. When “cluster” 
is used as the option, the Hadoop’s MapReduce will be active and data 
can be accessed from HDFS. Last, if “gcp”’ is the option used, Matlab’s 
Parallel pool with MapReduce will be activated and the data can be taken 
from HDFS. 


4.3.2. TECHNOLOGIES AND IMPLEMENTATION ISSUES FOR 
PROCESSING BID 


It is necessary to deal with a large number of small files in BIDP, which 
is one of the main drawbacks of Hadoop and other distributed processing 
technologies. Processing large no. of small files creates large no. of 
memory references and that generates a lot of overhead for name node in 
Hadoop. Besides, more number of mappers is needed for more number 
of files. Sequence file format solves the problem of processing too many 
small files. Many small files are clubbed into a single sequence file which 
is used for processing as input for MapReduce programs. 

The concept of Sequence File is putting each small file into a larger 
single file. Sequence files are binary files containing key-value pairs. They 
can be compressed at the record (key-value pair) or block levels. Because 
sequence files are binary, they have faster read/write than text-formatted 
files. Beyond packaging files into a manageable size, sequence files support 
compression of the keys, the values or both. So the type of compression 
determines the sequence file format. 


¢ Uncompressed (neither Key nor the Value is compressed, that is, 
Key/value records are uncompressed) 

¢ Record compressed (only values are compressed, key is not 
compressed) 

¢ Block compressed (Both keys & values are compressed) 


In all the implementations given here, the first MR job is to convert the 
given image files into sequence files. It can be done with Hadoop(.segq files) 
environment and as well as Matlab(.mat file). Before looking into the process 
of handling images, the process of handling big text data is discussed. 
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TABLE 4. 


Working with Text Data: In this method, the authors created a total 
of 116 nodes Hadoop cluster. One of the nodes is considered as the 
master node whereas the remaining 115 nodes are considered as the 
slave nodes. These 116 nodes are situated in three different labs of 
their college. Table 4.3 shows the configuration of all the nodes in 
this cluster. The configuration capacity has become 28.13 TB with 
the help of this cluster. The authors have uploaded text data of a total 
size 1.2 TB with a replication factor of 5 into HDFS to test the cluster 
performance. Then execution of the standard “word-count” example 
is carried out on this data. The time taken for completion is 8 min 8 sec. 
The authors have also uploaded the entire process into youtube given 
in the link: https://www.youtube.com/watch?v=CSryEIkNGdk. The 
sample code for this one is given here. 


.3  Hadoop Cluster Configuration with 116 Systems. 


Nodetype Ram _ Processor CPU _ Processor Operating Hadoop Location 


size cores speed system version 


Master 


Slavel- 
Slave30 


Slave31- 
Slave68 


Slave69- 
Slavell5 


8GB __Intel 8 3.60GHz Ubuntu 2.7.2 Labl 
i7-4790 (14.04)-64 


8GB Intel 8 3.60GHz Ubuntu DA Labl 
i7-4790 (14.04)-64 

4GB Intel 8 3.40GHz Ubuntu 2:72 Lab2 
i7-4770 (14.04)-64 


4GB Intel 8 3.40GHz Ubuntu 2.7.2 Lab3 
17-4770 (14.04)-64 


%word count.m 


mapreducer(0); 


datafolder = ‘/input’; 

files = fullfile(datafolder, ‘*.txt’); 

ds = datastore(files,’TextscanFormats’ , ‘%s’, ‘Delimiter’, ‘ ‘, 
‘ReadVariableNames’, false, ‘VariableNames’, ‘ Word’); 

output folder = ‘/output’; 


outds = 


mapreduce(ds, @mapCountWords, @reduceCountWords, ‘Output 
Folder’, output_folder); readall(outds) 
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*%mapCountWords.m 
function mapCountWords(data, info, intermK VStore) 
x = table2array(data); 
for i=1:size(x,1) 
disp([string(x(i,1)) 1]); % displaying the key value pair 
“which is output of mapper 
add(intermK VStore,string(x(i,1)),1); 
end 


end 


%reduceCountWords.m 
function reduceCountWords(intermkey, interm Vallter, outK VStore) 
sum_occurences = 0; 
while(hasnext(interm Vallter)) 

sum_ occurences = sum_occurences + getnext(interm Vallter) 
end 
add(outK VStore, intermkey, sum_occurences); 


end 


B. Working with Large Number of Images: The authors have integrated 
MATLAB with Hadoop and then built a cluster with 1-Master and 
110-Slave Nodes (5-nodes of the original cluster were removed 
due to memory limitations during MATLAB installation). The 
authors worked on the CBIR problem by considering different 
standard image datasets: Corel 1000, Corel 10000, Brodatz 
Textures, Mirflickr (1,000,000 images), and ImageNet (1,281,167 
images). Three MR Jobs are used in this process and they are given 
in Figures 4.5-4.7. The entire process is given in Algorithm-1. 
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Map phase Shuffle and Sort phase Reduce phase 


HDFS 


Image | Mapper 1 <Filename 1,ImageData> <Filename | ,ImageData> Reducer 1 <Filename 1 ,ImageData> 
Image N Mapper N <Filename N,ImageData>| <Filename N,ImageData> <Filename N,ImageData> 


FIGURE 4.5 Outline of MapReduce job 1. 


output file 


Shuffle and Sort phase Reduce phase 
Sequence file output file 


Chunk 1 


<Filename 1,ImageData> <Filename 1 ,ImageData> 
: : <Filename 1 Feature Vector> Reducer 1 <Filename 1,FeatureVector> 
<Filename m,ImageData>| '<Filename m,ImageData>| 


Chunk M 
<Filename n,ImageData> <Filename n,ImageData> 
: : <Filename N,FeatureVector> <Filename N,FeatureVector>| 
<Filename N,ImageData>| |<Filename N,ImageData> 


FIGURE 4.6 Outline of MapReduce job 2. 


Query Feature 
Vector 


Map phase Shuffle and Sort phase Reduce phase 
Sequence file output file 
Chunk 1 <‘distance’,(Filename 1, 
<Filename 1,ImageData> distance)> eet j 
: M 1 i istance’ (Filename 1, : 
: : oreRE <'distance’ (Filename k, distance)> eee 
}<Filename m,ImageData>| distance)> 
Chunk M 
<Filename n,ImageData> < distance’ (Filename 1, : ; 
: Mapper M distance)> <distange’ Filename T, < rink, Filename N> 
<Filename N,ImageData> 


Algorithm-1: 
Begin 
Step-1: Store all the images of the dataset into HDFS. 
Step-2: Give all the images to MR_Job1, which gives the <FileName, 
ImageData> as the output of this Job in the form of sequence 
file. 
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Step-3: The resultant sequence file of Step-2 is given as input to MR_ 
Job2, which results <FileName, Feature Vector> as the output. 

Step-4: For MR-Job3, the result of Step-3 is given as input along with 
the query Image which results in<Rank, Filename> with respect 
to the query image. 

Step-5: Calculate the performance measures from the rank matrix 
resulted in Step-4. 


C. Working with Big Image of Huge Dimension: The authors have 
created an image in Adobe Photoshop of dimension 32768 x 32768 
shown in Figure 4.8. In their lab, with an 8 GB RAM i7 processor 
system, it took around 1 hr 21 min to process, that is, to count 
the number of rectangles in it. So, the authors used Hadoop with 
MATLAB for processing this by MapReduce model. The authors 
divided the 32768 x 32768 image into 1024 pieces of size 1024 x 
1024 as shown in Figure 4.9(a), and 512 pieces are of size 2048 x 
2048 as shown in Figure 4.9(b). This was stored into HDFS and 
MapReduce model was applied to count the number of rectangles. 
It took around 12 min to create a sequence file and 1 min 15 sec to 
complete the process of counting the number of rectangles. 


FIGURE 4.8 Image of dimension 32768 x 32768. 
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FIGURE 4.9 (a) 1024 sized block (b) 2048 sized block of Figure 4.8. 


D. Working with Big Image of Huge Size: An image of a size of 3.14 
GB was created by the authors. The authors tried to upload the 
image into an image processing software, but it was not readable 
and it showed an error OUT OF MEMORY. Therefore, the authors 
have divided the image into different pieces. As the author’s HDFS 
chunk size is 64 MB, each piece was of size less than 64 MB. So, 
the image was divided into 100 blocks. Time elapsed for Job-1 
completion is 3 min 30 sec and Job-2 completion is only 50 sec. 


4.4 FUTURE RESEARCH DIRECTIONS 


MapReduce or RDD can be used for distributed computing, similarly, the 
same MapReduce or RDD can be done on Parallel GPUs instead of a 
cluster of computers with only CPU. This will be even faster in completing 
the executions in applications like CBIR, CBIR for Multiresolution image 
datasets and Object Detection. 


4.5 CONCLUSION 


The authors have shown the process of handling Big Image Data. Three 
different cases are shown: (1) to handle a large number of images (2) 
Working with Big image of huge Dimension, and (3) Working with Big 
image of huge Size. The authors have processed different standard image 
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datasets which are large in quantity to achieve image retrieval tasks using 
MapReduce paradigm by storing the data in a distributed file system. The 
different modes of parallel execution are discussed. The advantage of 
converting the files into sequence files is also discussed. 
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ABSTRACT 


Content-based image retrieval (CBIR) and classification algorithms require 
features to be extracted from images. Global and low level image features 
such as color, texture, and shape fail to describe pattern variations within 
regions of an image. Bag of Visual Words approaches have emerged in 
recent years that extract features based on local pattern variations. These 
approaches typically outperform global feature methods in classification 
tasks. Recent studies have shown that Word N-Gram models common in 
text classification can be applied to images to achieve better classification 
performance than Bag of Visual Words methods as it results in more 
complete image representation. However, this adds to the dimensionality 
and computational cost. State of the art Deep learning models have been 
successful for image classification. However, huge training data required 
for these models is a big challenge. This book chapter reviews the literature 
on Bag of Visual Words and N-gram models for image classification 
and retrieval. It also discusses few cases where the N-gram models have 
outperformed or given comparable performance to the state of the art Deep 
Learning Models. The literature demonstrates that N-grams is a powerful 
and promising descriptor for image representation and is useful for various 
classification and retrieval applications. 
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5.1 INTRODUCTION 


Techniques for the automated classification of images rely heavily on 
approaches that transform the image’s digital encoding into features 
derived for the.”° Low level features such as color, texture, and shape were 
proposed for this in the early 1990s. 

Color features such as histograms were common for image classification 
Texture is nothing but information about arrangement of intensities. Various 
Texture features have been used for classification of images. These are first 
order statistical features which are not able to provide much information 
about spatial correspondence.’? Second order statistical features such as 
based on co-occurrence matrix based features are found to be powerful 
in distinguishing among various texture images.' Another texture feature 
used is Local Binary Pattern (LBP), which is sensitive to noise in uniform 
regions.”® Spectral methods such as Gabor Filters,*° Fourier Transforms,” 
which converts the image into signal by sampling were also popular for 
image classification tasks. Shape features such as Zernike Moments*! were 
also been used for classification tasks. However, they are computationally 
expensive. 

However, image representation using low level features suffered from 
the semantic gap problem. The semantic gap’? is the gap between the high 
level concepts (for example, “Find pictures of Sunset’’) expected in a user’s 
query and the information modeled by low level features. Moreover, it 
is difficult for a user to search for images using criteria such as color, 
texture, and shape.”* Most importantly, low level features are the global 
image features and represent the image as a whole, but do not give much 
information about local pattern variations. 

The idea of capturing local pattern variations in an image gave rise to the 
use of Bag-of-Visual-Words models (BoVW) for image representation.” 
BoVW model was inspired by Bag-of-Words (BoW) model in the text 
retrieval domain which has been proven to be efficient and is now widely 
deployed.*’ Text documents mainly contain meaningful words and so can 
be represented by a feature vector of counts of various words appearing 
in the document. A BoVW approach was first applied to video retrieval 
by Sivic and Zisserman.” In this approach, an image is described by a 
number of occurrences of different visual words. Visual words are local 
image patterns, which can describe relevant semantic information about 
an image. This model soon became popular for image retrieval and 
classification applications due to its accuracy, !13759-6668:82 
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However, there are fundamental differences between text and images. 
First, text words are discrete tokens whereas local image descriptors 
are not. This necessitates techniques to generate a visual vocabulary by 
clustering the local feature descriptors. Vector quantization is a common 
technique for this but, in contrast to the text BoW, the feature vector 
generated is typically highly dimensional and the generation process is 
computationally complex.*' Second, text is unidirectional whereas images 
can be read in several different directions. 

Although, the BoVW model has proven to be much better than models 
using low level features such as color and texture,* it has major drawbacks. 
The BoVW model does not consider spatial relationships among visual 
words. Another BoVW drawback involves the high computational 
cost to generate vocabularies from low level features.” Further, the 
vocabulary construction process often results in noisy words that diminish 
classification.” 

The visual N-grams model was first proposed for images. In order to 
take spatial relations between visual words into account. There are two 
types of N-grams formulations in the text retrieval context. Word N-grams 
are formed by sequences of N consecutive words in a document; whereas, 
character N-grams are formed by sequence of N consecutive characters. 
Examples of word 2-gram are “image processing,” “artificial intelligence,” 
“medical systems,” etc. In contrast, examples of character N-grams are 
the 3-grams in the phrase “his pool” “his, is_,s_p, po, poo,ool” and the 
4-grams “his _,is_p, s po, poo, pool.” The N-gram model had proven 
to be more accurate than other models in text context.** Therefore, its 
application for image classification by Pedrosa and Traina,” promised 
semantically meaningful image representation. Since then, visual N-grams 
for images have not been widely researched despite favorable early results. 
In addition, pixel N-grams inspired from the character N-grams have also 
only recently been advanced. 

This chapter provides a detailed information of Visual N-grams 
in relation to BoVW models for image classification and retrieval 
applications so that the differences between these two approaches can be 
clearly described. It also discusses the work using BoVW and N-gram 
approaches which have outperformed the state-of-the art deep learning 
approaches. This book chapter is organized as follows: Section 5.2 
describes local features used for constructing image models for BoVW 
and N-grams. Section 5.3 describes the vocabulary/dictionary creation to 
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highlight the source of the computational complexity. Section 5.4 outlines 
various approaches for visual N-gram generation to analyze claims of 
computational efficiencies. Some of the major challenges of BoVW and 
N-gram approaches are discussed in Section 5.5. Section 5.6 mentions 
various deep learning approaches for image classification. Section 5.7 
concludes the paper. 


5.2 LOCAL FEATURE EXTRACTION FOR BOVW AND N-GRAMS 


The BoW model was first introduced in the text retrieval and categoriza- 
tion domain where a document is described by a set of keywords and their 
frequency of occurrence in the document. The same idea was applied to 
the image domain and has been quite successful.” Here, the idea is to 
represent an image using a dictionary of different visual words. Images 
are quite different from text documents in the sense that there is no natural 
concept of a word in case of images.* Thus, there is a need to break down 
the image into a list of visual elements. Moreover, as the number of possible 
visual elements in an image could be enormous, these elements should be 
discretized to form a visual word dictionary known as a codebook. 

Vocabulary construction has been achieved mainly using two approaches: 
local, patch-based approach or dense sampling*** and key point-based 
approach or sparse sampling.'*°** In the patch-based approach, the image 
is divided into a number of equal sized patches by using a grid. Local 
features are then computed for each patch separately. Keypoints are the 
centers of salient patches generally located around the corners and edges. 
Keypoints are also known as interest points and can be detected using 
various region detectors such as the Harris—Laplace detector (corner-like 
structures), Hessian-affine detector,” Maximally stable extremal regions or 
the Salient regions detector.* Local features are then computed for each 
interest point. 

Some of the state-of-art local feature descriptors used for modeling 
texture information include Scale Invariant Feature Transform (SIFT),™ 
Speeded Up Robust Features (SURF),° Histogram of Oriented Edges 
(HOG),'® Local Ternary Pattern (LTP),’* and Discrete Cosine Transform 
(DCT).'° Color hues and shape features have also been used as local feature 
descriptors by some of the researchers. These local feature descriptors are 
briefly described below. 
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SIFT descriptors* are invariant to image translation, illumination, 
noise, scaling, rotation, and partially invariant to illumination changes. 
These features are robust to local geometric distortion® and are the most 
commonly used local feature descriptors for the BoVW model. However, 
the limitations include high computational cost and the huge feature vector 
dimension (128 dimensions for each keypoint). 

SURF features® are modified SIFT features. SURF features are high- 
performance, scale and rotation-invariant and they outperform SIFT 
features with respect to repeatability, distinctiveness, and robustness.*4 
Computation time for calculating SURF features is reduced with the use 
of a fast Hessian matrix-based detector and a distribution-based descriptor. 
SURF descriptors have been successfully applied for diabetic retinopathy 
lesion detection,* video stabilization,” video copy detection,*’ recognition 
of museum objects,° and multi-person tracker.”° 

HOG’ is also a simplified form of SIFT. It differs from SIFT in 
which it is computed on a dense grid of uniformly spaced cells and uses 
overlapping local contrast normalization. It calculates intensity gradients 
from pixel to pixel and selects a corresponding histogram bin based on 
gradient direction. The key advantage of HOG is that it is invariant to 
geometric and photometric transformations and is more accurate as 
compared to wavelets as well as SIFT® for human detection and scene 
categorization. However, HOG descriptors are dependent on the angle of 
the acquisition camera." 

Another popular feature used for capturing texture information are 
LTP.” These features are calculated using the binary difference between 
Gray value of a pixel and Gray values of P neighboring pixels on a circle 
of radius around it. ”* They have been used for different applications such 
as texture classification, face recognition, and background subtraction in 
complex scenes.°° Advantages of LTP include rotation invariance and less 
sensitivity to noise as the small pixel difference is encoded into a separate 
state. To reduce the dimensionality, the ternary code is split into two binary 
codes: a positive LBP and a negative LBP. However, this splitting may 
cause significant information loss. 

Features using a spectral approach (frequency domain) such as Discrete 
Cosine Transform (DCT) are also used for image classification applications. 
Some of these applications include histology image classification,'* 
detecting pornographic video content*'; object classification!®; HE-p2 
image classification”; histopathology image classification.” These features 
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outperformed SIFT descriptors.'* An advantage of DCT is the capacity to 
pack the energy of spatial sequences into as few frequency coefficients 
as possible known as energy compaction thus reducing the feature vector 
dimensionality. The main disadvantage of DCT is the blocking effect. 
When the image is reduced with higher compression ratios, the blocks 
become visible degrading the picture quality. 

The features described above are texture features and are mainly based 
on intensity distribution of the pixels in an image. Apart from texture, 
color is also a powerful image descriptor. Use of Color Hues (Hue, Satu- 
ration and Intensity) along with the N-gram concept was first proposed 
by.”! This technique preserves some of the spatial color correlates within 
an image to provide a more selective matching mechanism than global 
color histograms. Here, images are encoded with respect to a codebook 
of features which describes every possible combination of a fixed number 
of coarsely quantized color hues that might be encountered within local 
regions of an image. This enables images to be compared on the basis of 
their shared adjacent color artifacts or boundaries. This approach is analo- 
gous to a technique employed in text retrieval systems which use character 
substrings as the basis of the indexing and matching mechanism.” 

Shape is the another important property for representation of an image. 
Various shape features such as Zernike moments, wavelet transforms, 
and Bsplines have been widely used for image representation. These 
features are based on mathematical formulations and have little to do with 
human visual perception. Perceptual features are higher level represen- 
tations which try to capture richer semantic content and exploit human 
visual perception rules. The idea of perceptual shape features was used 
by Mukanova et al.,°° for creating shape N-grams. These N-gram-based 
perceptual shape features can efficiently represent global shape informa- 
tion in an image and are seen to significantly increase the performance 
of the SIFT-based BoVW approach. The main drawbacks of the shape 
features are the computational cost and the requirement for segmentation. 

Two main sampling strategies are employed in order to compute 
the above-mentioned local features. The SIFT, SURF, and Shape-based 
features are considered to be sparse descriptors or keypoint-based descrip- 
tors; whereas the HOG, LTP, DCT, Color Hues are considered dense 
descriptors or patch-based descriptors. It is clear that the dense sampling 
descriptors outperform the sparse descriptors as some of the information 
is lost in keypoint-based approach.’° 
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5.3. VOCABULARY CONSTRUCTION 


After calculation of local features, the next step in the BoVW or N-gram 
representation of an image is the vocabulary construction. Since an image 
does not contain discrete visual words, a challenging task is to discover 
meaningful visual words. This can be achieved by clustering local features 
so that cluster centroids can be treated as visual words. Various clustering 
algorithms such as Generalized Llyod Algorithm (GLA), Pairwise Nearest 
Neighbor Algorithm (PNNA) and K-means Algorithm have been widely 
used for this purpose.” However, GLA is computationally complex and 
cannot guarantee an optimal codebook generation.” On the other hand, 
PNNA is more efficient than GLA but slightly inferior to GLA in terms 
of optimality.?!*’ Further, the K-means algorithm performs better than 
the hierarchical algorithms in terms of accuracy and computation time. It 
differs from the GLA in that the input for k-means algorithm is the discrete 
set of points rather than continuous geometric region. This algorithm 
partitions N number of local features into K clusters in which each feature 
belongs to the cluster with the nearest mean. This is the most commonly 
used algorithm for visual codebook generation.49!64¢525865,75:83.90 

The approaches for vocabulary construction can be mainly grouped 
under two main categories: global dictionary and sub-dictionary. If a single 
dictionary of visual words is created using all the images in the collection, it 
is called as global dictionary.*!8°?°6.75.70.9! On the contrary, sub-dictionary 
approach considers subset of visual words that best represent a specific 
image class and is also known as region-specific visual words. For example, 
in diabetic retinopathy images, two sub-dictionaries related to lesion and 
no-lesion classes can be separately created.*? Classification as well as 
retrieval performance can be improved over the global dictionary approach 
using the sub-dictionary approach.” 

Creation of visual N-gram codebook can be more challenging than the 
BoVW codebook creation. This is due to the fact that as opposed to text, 
an image can be read in many different directions (horizontal, vertical, 
at an angle of © degrees). Further, visual N-grams that have the same 
order but different orientations may be related to the same pattern. One 
such approach of generating rotation invariant N-gram codebooks can 
be seen in the work of Lopez-Monroy et al.°? Moreover, as N increases 
the dictionary size is increased tremendously if we consider all possible 
combinations of visual words in all possible directions. 
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5.4 DIFFERENT APPROACHES FOR VISUAL N-GRAMS 


Although, BoVW generates promising results in image retrieval and 
classification tasks; loss of spatial information and noisy words creation 
are two major drawbacks of this approach.” The limitation of spatial 
information loss could be overcome by using visual N-grams.”° N-grams 
is a description obtained by grouping visual words where the arrange- 
ment between the visual words in an image is encoded. This is because 
the appearance of the visual words can change profoundly when they 
participate in relations. Further, the N-gram models for image features 
are simple and are able to scale up the content representation just by 
increasing N.? 

By analogy to the text document (see Figure 5.1), there are mainly 
two approaches for visual N-grams image representation. Visual Word 
N-grams model the spatial relationship among the visual words. In 
contrast, Visual Character or Pixel N-grams, model the relationships 
among the pixels in various directions. These approaches are detailed 
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FIGURE 5.1 Text and image N-gram analogy. 
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5.4.1 KEYPOINT-BASED N-GRAMS 


Keypoints/Interest points are the points of local maxima and minima of 
difference of Gaussian function.*? These keypoints are described with the 
help of SIFT features and clustered for construction of visual vocabulary. 
The centroids of the clusters represent visual words. The N-gram dictionary 
is then created considering N neighboring visual words in all possible 
directions.” It is evident that as N increases, a more complete representa- 
tion of an image is generated. Here, authors have used 1, 2, and 3 g to 
analyze retrieval precision as well as classification accuracy on various 
databases namely Corel 1000,*” Lung database, Medical Image Exams 
database, Texture database. These experiments show that the visual word 
N-grams (bag-of-visual-phrases) approach improved retrieval precision up 
to 44% and classification accuracy up to 33%. However, the use of visual 
words to represent an image in this way may involve a loss of fidelity to 
visual content since two local features associated with the same visual 
word are used in the same way to construct the image signature, whether 
they are identical or noticeably different. An approach for generating a 
more realistic image signature considering the differences between textual 
words and visual words can be seen in the work of some researchers.*? 
Some more examples of the use of keypoint-based N-grams are large scale 
image retrieval,'’ automatic learning of visual phrases,* classification of 
images in Caltech dataset*® and biomedical image classification.**°:4 

For visual characterization, the frequency of occurrence of visual words 
as well as the spatial information between the visual words is equally 
important. A major challenge in using word N-grams is the dimensionality 
and hence the computational cost. It is clear that the number of all possible 
combinations of N-grams increases exponentially with N. That is, given a 
dictionary with m words, the number of all possible N-grams is m\¥. 

A novel effective and efficient technique to extract the frequency 
and appearance of visual words has been proposed in Pedrosa et al.® In 
this approach, 2-grams are generated by placing a circular region over 
each keypoint. All pairs of words in this region formed with the centre 
point are 2-grams. Two bags of 2-grams are then generated. One bag for 
2-grams with angle within [—135, 135] and [—45, 45] and another bag for 
2-grams with angle within the interval [135, 45] and [-135, —45]. Then the 
frequency of 2-grams for each bag according to dictionary of 2-grams is 
noted. This is called as bag-of-2-grams approach. The results demonstrate 
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that the classification accuracy is improved by 6.03% as compared to the 
BoVW approach. Further, this approach computes the Shannon entropy 
over a random “bunch” of 2-grams and demonstrates that the dimension- 
ality can be significantly reduced. 

Keypoint-based approaches identify points throughout the image that 
are used as reference points from which N-grams are generated in one 
way or another. Local patch-based N-grams, discussed next replace the 
keypoints, with regions called keyblocks. 


5.4.2 LOCAL PATCH-BASED N-GRAMS 


In this approach, an image is divided into small local patches using a 
grid. Local features are computed for each patch separately. A codebook 
or dictionary of visual words is then created by clustering all the patch 
descriptors. N-gram codebook is then developed by considering the 
N-consecutive visual words present in an image. 

The idea of N-grams using local patches was first proposed by Zhu et 
al.,’' and was called the keyblock approach. Keyblocks are similar to key 
words in text and using these keyblocks, images can be represented as a 
code matrix in which the elements are indices of keyblocks in the code- 
book. Uni-block, Bi-Block (horizontal, vertical, diagonal), and Tri-Block 
(horizontal, vertical, diagonal, triangular) configurations were used. The 
disadvantage of Bi- and Tri-Block models is increased dimension of feature 
vector requiring large storage and, therefore, less efficiency and retrieval 
performance because of highly sparse nature. However, the dimensionality 
can be reduced by selecting only useful Bi- and Tri-Blocks. It is reported 
that combination of Uni-, Bi-, and Tri-blocks result in improvement in 
retrieval performance. Experiments were conducted on Brodatz texture 
database (TDB),'° CDB (snapshot of images on web). Keyblock approach 
is compared with traditional color histogram and color coherent vector 
techniques using CDB and compared against Haar and Daubechies wavelet 
texture techniques using TDB. Using the keyblock approach, 12% of all 
relevant images are among top 100 retrieved images as compared to 9% of 
color histogram and 6.5% returned by Color Coherent Vector. Also, at each 
recall level keyblock approach achieved higher precision. In this study, it 
has also been observed that the keyblock approach outperforms the Haar 
and Daubechies wavelet texture approaches. 
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Recently, local patch-based N-grams were used for histopathological 
image classification.” The local patches were represented using DCT 
features. Here, the main idea was to produce N-grams ignoring the 
orientation in which they appear. Visual N-grams that have the same order 
but different orientation (e.g., if an image is rotated), like 12-65-654 and 
654-65-12 are considered same, thus making the N-gram features rotation 
invariant. Another main idea in this study was to combine the N-gram 
features such as 1 + 2 gram, 1 + 2 +3 gram and 1 +2+3+4 gram. The 
1 + 2 gram produced the highest classification accuracy of 64.31%. The 
reason is because longer sequences produce large vocabulary resulting in 
sparse feature vector. Results re-enforce the fact that the use of N-grams 
outperform the BoVW technique. Composing simple image descriptions 
using the patch-based N-grams can be seen in Li et al.” It is observed 
that keypoint-based samplers such as Harris—Laplace work well for small 
numbers of sampled patches; however, they cannot compete with uniform 
random patch-based sampling using larger numbers of patches for best 
classification results.*° 


5.4.3 COLOR N-GRAMS 


Color features have been used for CBIR because they can be easily extracted 
and are powerful descriptors for images. Color histograms representing 
relative frequency of color pixels across the image are common for CBIR. 
However, they only convey global image properties and do not represent 
local color information. In the Color N-grams approach, an image has been 
represented with respect to a codebook, which describes every possible 
combination of a fixed number of coarsely quantized color hues.’! This 
allows comparison of images based on shared adjacent color objects or 
boundaries. N-gram samples were taken to be 25% of the total number of 
pixels in an image. The dataset included 100 general color images of faces, 
flowers, animals, cars, and aeroplanes. The results were compared with 
the approach adopted by Faloutsos et al.*' The average rank of all relevant 
images was reported to be 2.4 as compared to the 2.5 of the baseline. 
Also the number of relevant images missed was 1.9 as compared to 2.1 
of the baseline. The limitation of this study is that the quantization of the 
hues does not match the sensitivity of the human color perception model. 
Another limitation was the very small database used. However, further 
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work has demonstrated that this approach could also be used for very large 
databases.’” Moreover, this approach is less sensitive to small spectral 
differences and is not prone to color constancy problems. 


5.4.4 SHAPE N-GRAMS 


The concept of N-gram has been used to group perceptual shape features 
to discover higher level semantic representation of an image.*° Here, 
low-level shape features are extracted and perceptually grouped using the 
Order Preserving Arctangent Bin (OPABS) algorithm advanced by Hu and 
Gao. This is based on perceptual curve partitioning and grouping PCPG 
model.** In this PCPG model, each curve is made up of Generic Edge 
Tokens (GET) connected at Curve Partitioning Points (CPP). Each GET 
is characterized by monotonic characteristics of its Tangent Function (TF) 
set. The extracted perceptual shape descriptors are categorized as one of 
eight generic edge segments. 

Gao and Wang’s model is based on Gestalt’s theory of perceptual 
organization which states that humans perceive the objects as a whole. 
The authors define shape N-gram as continuous subsequence of GETs 
connected at CPP points. There are three main cases of how the GETs 
are connected at CPP. The first references a curve segments connected to 
another curve segment (CS—CS); the second is a line segment connected 
to line segment (LS—LS), and the third is curve segment connected to line 
segment (CS—LS). Here, four N-gram based perceptual feature vector are 
proposed, which encode local and global shape information in an image. 
The Caltech256 dataset was used for classification experiments.”’ Results 
show that the combination of shape N-grams with conventional SIFT 
vocabulary achieve around 8% higher classification accuracy as compared 
to SIFT-based vocabulary alone. 

Further, the development of CANDID (Comparison Algorithm for 
Navigating Digital Image Database)°*’ was inspired by the N-gram approach 
to document fingerprinting. Here, a global signature is derived from various 
image features such as localized texture, shape, or color information. A 
distance between probability density functions of feature vectors is used 
to compare the image signatures. Global feature vectors represent single 
measurement over the entire image (e.g., dominant color, texture). Whereas, 
the N-gram approach allows for the retention of information about the relative 
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occurrences of local features such as color, gray scale intensity or shape. Use 
of probability density functions can reduce the problem of high dimensions; 
however, they are computationally more expensive than histogram-based 
features.** It is observed that subtracting a dominant background from 
every signature prior to comparison does not have any effect while using 
true distance function; whereas, considering a similarity measure such as 
nSim(I1,J2), dominant background subtraction has a dramatic effect. The 
experiments were conducted on satellite data (LandSet TM 100 images) and 
Pulmonary CT imagery (220 lung images from 34 patients). Experimental 
results show good retrieval precision. 

The word N-gram approaches are divided into keypoint based and local 
patch based according to the sampling strategies used; whereas, based on 
local features used these approaches are divided into color N-grams and 
shape N-grams. Another concept called character N-grams in the text 
retrieval domain has also been applied recently for image representation. 
This is described below. 


5.4.5 VISUAL CHARACTER/PIXEL N-GRAMS 


In the text, retrieval context Character N-grams are phrases formed by N 
consecutive characters. For languages such as Chinese, where there are no 
specific word boundaries, the character N-grams have resulted in higher 
retrieval accuracies and are found more efficient than the word N-gram 
model in several cases.*° If we consider every pixel in an image as a 
character, the character N-gram concept from text retrieval can be easily 
applied to the image representation. A first attempt to apply the character 
N-gram concept for mammographic image classification show promising 
results.“!” It has been observed from the further experiments that the 
visual character N-grams (Pixel N-grams) outperform the traditional 
co-occurrence matrix-based features for classification of mammograms. 
Moreover, the character N-gram features are found to be seven times faster 
than the co-occurrence matrix feature computation.” The Pixel N-grams 
also show improved classification performance compared with the BoVW 
for texture classification experiment with an added advantage of simplicity 
and less computational cost.“* Thus the Pixel N-grams try to overcome 
the two drawbacks of the visual word N-grams; namely computational 
complexity and feature vector dimensionality. 
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5.4.6 VISUAL SENTENCE APPROACH 


Anew representation of images that goes further in the analogy with textual 
data, called visual sentences, has been proposed by Tirilly et al.” A visual 
sentence that allows visual words to be read visual words in a certain order. 
An axis is chosen for representing an image as a visual sentence, so that 
(a) it is at an orientation fitting the orientation of the object in the image, 
(b) it is at a direction fitting the direction of the object. The keypoints are 
then projected onto this axis using orthogonal projection. In this work, 
SIFT descriptors are used and keypoints detection is achieved using 
Hessian-affine detector. The main problem is to decide the best axis for 
projection. Experiments include five different axis configurations: 1 PCA 
axis, 2 orthogonal PCA axis, 10 axis obtained by successive rotation of 10 
degrees of main PCA axis, X-axis and finally one random axis. Results 
show that the approach with X-axis outperforms those with the PCA axis 
on classification tasks.'°4° This is because the PCA axis is biased by 
background clutter. However, PCA axis takes spatial relations into account 
and outperforms the random axis or the multiple axis configurations.’°*° 


5.4.7 CONTEXTUAL BAG-OF-WORDS 


Two relations between local patches in images or video keyframes can be 
important for categorization. First, there is the semantic conceptual relation 
between patches. That is the relation of appearing on the “same part,” 
“same object,” or “same category.” For example, “wheel of a motorbike,” 
“window of a house,” “eye of human.” Further, semantic relations can be 
interpreted in multiple levels, for example, patches of same scene, object, 
object parts, and so on. Second is the spatial neighborhood relation. Patches 
when combined together to form a meaningful object or object part are 
considered as having spatial neighborhood relation. These two types of 
relations are called as contextual relations. Traditional BoW model neglect 
the contextual relations between local patches. Nevertheless, it is well 
known that the contextual relations play an important role in recognizing 
visual categories from their local appearance. A contextual-bag-of-words 
(CBOW) considers two types of relationships between local patches. On 
the 15 scene database, the classification accuracy using CBOW is found to 
be significantly better than the traditional BoVW model.” 
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The major problem of BoVW or N-grams approach is the feature 
vector dimension and, hence, computational cost. Various ways to achieve 
dimension reduction are discussed in the next section. 


5.5 CHALLENGES OF VISUAL N-GRAM APPROACHES 


Table 5.1 displays a summary of various approaches of N-grams for 
image classification and retrieval applications. Despite the early success 
of BoVW and N-grams model with regards to classification and retrieval 
performance, the use of these models face few critical challenges. 

One of the major challenges in representation of images using BoVW 
and visual N-gram model is the construction of the codebook or visual 
words dictionary. Mainly two types of codebook generation can be 
observed. The first is a global dictionary where the patches or keypoints in 
the entire image are clustered for creating a single dictionary (Avni et al., 
2010). The second approach uses sub-dictionaries or region specific BoW 
(Jelinek et al., 2013; Pedrosa et al., 2014; Wei Yang et al., 2012). Using 
sub-dictionaries has shown to boost the mAP by 2% and classification 
accuracy by 4.25% as compared to global dictionary representation 
(Pedrosa et al., 2014). Furthermore, various clustering algorithms play an 
important role in the visual codebook creation; for example, GLA, PNNA, 
and k-means algorithm out of which k-means is the most commonly used 
clustering algorithm due to its efficiency and optimality. 

Another challenge is to reduce the dimensionality of the feature vector 
resulting in the reduction of computation cost. This can either be achieved 
by reducing the dictionary size or reducing the size of feature vector 
representing the local patch or keypoint. One way to reduce the dictionary 
size is by eliminating the visual words common to all the categories of 
images as they add very little discriminating power to the feature vector. 
Another way to reduce the dictionary size is the elimination of noisy words 
created due to the coarseness of the vocabulary construction process. 
These noisy words could be eliminated with the help of probabilistic latent 
semantic analysis. The dictionary size could also be reduced by ignoring 
the N-grams with repeated visual words and considering inverted N-gram 
as same visual phrase (Pedrosa and Traina, 2013). For reducing the 
dimensionality, an approach called “bunch of 2-grams” has been proposed 
by (Pedrosa et al., 2014). Here, the feature vectors are grouped in bunches 
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Summary of Various N-gram Approaches. 


Author Year Model Local features Dataset Advantages Application 
Kelly etal. 1994 CANDID 5 one dimensional Pulmonary CT scans Features invariant to Retrieval 
kernels rotation pulmonary 
diseased cases 
Rickman 1996 Color N-grams Color hues (Hue 100 color images Robust to noise Rapid fuzzy Retrieval of color 
and Rosin saturation and matching of color images = images 
intensity) 
Soffer 1997 NxM grams N x M grams Fingerprint, floorplans, Works well on simple Image 
absolute count and comics, animals etc. images suchas floorplan, categorization 
frequency music notes, comincs 
Zhu et al. 2000 = Key-block CDB-500 web color images Superior to color histogram, Image retrieval 
divided into 41 groups color coherent vector, Haar 
TDB-2240 Gray scale and Daubechies wavelet 
Brodatz texture images texture approach 
divided into 112 categories 
Zhu et al. 2002. n-block Bi-block Corel :31646 images CDB: Superior to color histogram, Image Retrieval 
and Tri-block web color images Brodatz color coherent vector, Haar 
texture database and Daubechies wavelet 
texture features 
Lazebnik 2006 Pyramid matching SIFT Caltech-101 Graz Improved performance than Recognising 
et al. the orderless representation natural scene 
categories 
Zhang etal. 2006 Bag of Visual SIFT (Salient local Caltech-101: BoVP approach is 20% Retrieve images 
Phrases patch) more effective than BoVW containing desired 
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TABLE 5.1 (Continued) 
Author Year Model Local features Dataset Advantages Application 
Wuet al. 2007 — Extended document Raw pixel value Caltech Robust to translation, Image 
model SIFT Corel illumination variance, view Classification 
Texture histogram point change and complex 
backgrounds 
Tirilly et al. 2008 — Visual sentence SIFT Caltech-101 Visual sentences are Image 
independent of rotation and classification 
scaling 
Liet al. 2011 N-grams objects, attributes | PASCAL2010 Web scale N-grams can be Image annotation 
spatial relationships used to create sentences to _ sentence 
annotate an image 
Li, Mei, 2011 Conceptual SIFT (Dense Local TRECVID2005 Superior performance than Video event 
Kwen, Hua bag-of-words patches ) convenient BoVW and scene 
categorization 
Daietal."” 2013 Visual groups Master feature and Oxford Buildings 5k (5062 Outperforms BoVW Model. Large scale image 
member features images) Flickr 1M: Images __ Inclusion relationship retrieval 
of famous landmarks is invariant to image 
transformations 
Pedrosaet 2013 BoVP SIFT Corel 1000: 1000 images BoVP improves upto 44% Image 
al. Lung CT: 234 images of retrieval precision and — Classification and 
Medical exams database: 33% classification rate retrieval 
2200 x-ray and MRI compared to BoVW 
Wangetal. 2013 BoVP HOG, Shape CTC data from 20 patients | Automatic way to learn Computer aided 


context, SIFT 


for tenae detection CTC data visual phrases 


from 50 patients for polyp 


classification 


teniae detection, 
classification of 
colorectal polyp 
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TABLE 5.1 (Continued) 
Author Year Model Local features Dataset Advantages Application 
Battiato et 2013  N-grams SIFT Flickr: 3300 images Exploit coherence between Near duplicate 
al. UKBench: 10200 images feature space not only in image detection 
image representation step 
but also during codebook 
creation. Outperforms BoVP 
Monroy 2013. N-gram Discrete cosine Histopathological dataset: 1 +2 grams improves Histopathological 
etal. combination transform 1417 images of 7 categories accuracy by 6% than BoVW classification 
for basal cell 
carcinoma 
Mukanova 2014 Shape N-grams SIFT, perceptual Wang : 100 images of 10 Improves accuracy by 8% Classification of 
etal. shape features categories Caltech 256: 10 as compared to traditional images 
classes each with 80 images BoVW 
Pedrosaet 2014  Bag-of-2-grams SIFT ImageCLEFmed 2007: 5042 Classification accuracy Biomedical image 
al. Bunch of 2-grams biomedical images of 32 is improved by 6.03% as classification 
categories compared to traditional 
BoVW 
Pedrosa, 2014 ~— Sub-dictionaries SIFT ImageCLEF 2007: 5042 Boosted mAP by 2% and _ Image retrieval 
Triana images of 32 categories classification accuracy and classification 
by 4.25% as compared to 
BoVW. 
Ruber 2018 N-gram+Graph SIFT KTH Weizmann UCF Sports Improvement in mean Action recognition 
Hernandez- UCF Youtube Hollywood2 average precision is noticed from videos 


Garcia et al. 


as compared to BoVW 


OTT 
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and each bunch is represented using its Shannon entropy. A dimensionality 
reduction up to 99% can be achieved using this approach. 

A problem with the word N-gram approach is that the number of phrases 
can exponentially grow with respect to number of words ina phrase. A subset 
of these phrases may be selected by using sophisticated mining algorithms, 
but it is still risky to discard a large number of phrases as some of which 
may be the representative ones for an image. Moreover, with increase in 
N, this model produces very specific features which make it difficult for 
classifiers to generalize well. For this model, the computational cost can be 
further reduced by gray scale reduction, but choosing a gray scale reduction 
to preserve image details and increase noise robustness while reducing the 
computational cost is a challenging task. Also, the choice of N for best 
performance varies according to the dataset used. 


5.6 DEEP LEARNING MODELS FOR IMAGE CLASSIFICATION 


Deep learning neural networks are part of machine learning algorithms 
based on artificial neural networks. These networks basically do not require 
handcrafted features. Various architectures of deep learning techniques 
exist, such as convolution neural networks, deep neural networks, recurrent 
neural networks, and deep belief networks. 

Breast cancer detection using deep learning framework was quite 
successful for cytology images.*? Combination of Convolution Neural 
Network (CNN) and extreme machine learning achieved 99.5% accuracy 
for classifying cervical cancer detection.** CNNs have also been shown to 
achieve high accuracy for brain tumor classification by Amin et al..* The 
editorial discusses how the deep learning methods are used for medical 
image segmentation, computer aided detection, classification tasks.”° 

Bag of Visual Words and N-grams are still a good option in case of less 
training data. A study by Kumar* shows that the BoVW (96.5% accuracy) 
has worked better than the deep learning models (94.76% accuracy) for 
histopathological image analysis. Another work by Huang demonstrates 
that N-gram applied to focal liver lesions classification using CT images 
have provided 83% accuracy and also high training speed.**”? Lopez- 
Monroy used the Distributional Term Representations (DTR) for various 
image datasets and has demonstrated that this technique works better than 
the deep learning neural networks.*! The DTR technique is modified form 
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of visual N-grams and considers statistics of visual word occurrences and 
co-occurrences. 

Although, deep learning has several advantages over traditional methods 
it becomes challenging in the field of medical image classification because 
of the significant intra-class variation and inter-class similarity caused by 
the diversity of clinical pathologies and imaging modalities. A synergic 
deep learning approach has been proposed by Zhang et al.,** to overcome 
this limitation. Another disadvantage using deep learning networks is that 
the network requires a lot of training data and getting annotated medical 
images for training is a big challenge.*”***° Deep learning is computation- 
ally very expensive and requires high power computational resources 
such as GPU environment and lot of training time.’ It is very difficult to 
comprehend what is learnt by the deep neural network.*°” 

BoVW model performance is found less satisfactory for geographical 
imagery because of the complexity and diversity of landscape. In this case, 
an experiment combining the CNN-based spatial features and the BoVW- 
based image interpretation was very much successful for geographical 
image classification task.” 


5.7 CONCLUSION AND FUTURE DIRECTIONS 


In this chapter, we discussed the literature on BoVW and N-gram models 
for image classification and retrieval applications with respect to local 
features used, vocabulary construction process and various N-gram 
approaches. It is evident that the BoVW model gives better classification 
and retrieval performance as compared to global image features such as 
texture, color, and shape. BoVW model, however, does not incorporate 
spatial relationships. N-gram model try to incorporate the spatial relation- 
ship and increase performance but certainly add to the computational 
complexity. N-grams can be classified based on sampling strategies 
(sparse and dense). Dense sampling approaches work better than the 
sparse sampling. We looked at various N-gram approaches namely 
keypoint based, local patch based, color based, shape based, Pixel based, 
visual sentence approach, and CBOW. The literature demonstrates that 
N-grams is a powerful and promising descriptor for image representa- 
tion and is useful for various applications such as content-based image 
retrieval, classification, annotation, action recognition for various types of 
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images (natural scenes, biomedical images, texture images, fingerprints) 
and videos. '*!74°.° 

However, some of the challenges of N-gram models is to reduce the 
vocabulary size, computational cost, choice of sampling strategy, choice 
of local features, choice of clustering algorithm, dimension reduction and 
reducing noisy word creation during vocabulary construction process. 

Recent trend for image classification/retrieval is to use deep learning 
neural networks. In deep learning approach, the work of generating and 
optimizing image features is automatically done by the various layers of 
deep neural networks. We have discussed some literature where the visual 
N-grams model has outperformed deep learning models with an added 
advantage of less training time and less amount of training data. Finally, 
we have also shown the experiments where the features of BoVW and 
features from deep learning can be combined for better accuracy of large 
dataset. Therefore, we conclude that the visual N-grams are still a good 
choice for many image classification and retrieval applications where the 
datasets are small. 
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ABSTRACT 


Evolutionary algorithms (EAs) are the subject of artificial intelligence 
studies in computer science. These algorithms which simulate the change 
in nature are applied to traditional computer algorithms. Some of the EAs 
that adopt this idea are genetic algorithms (GAs), bee colony algorithm, 
Firefly algorithm (FA), particle swarm optimization (PSO), bacteria 
foraging algorithm, etc. EAs are used in many fields such as computer 
networks, image processing, artificial intelligence, cluster analysis. In 
this chapter, the studies that are conducted between 2014 and 2019 on the 
application of EAs to 2D MR images are examined according to in which 
stage of image processing these algorithms are used, the publication year 
of the articles, and classification of accuracy rates. 


6.1 INTRODUCTION 


Brain MR images are one of the most commonly used image types in 
the field of biomedical image processing. Today, many diseases such as 
cancer, schizophrenia (SZ) can be diagnosed by scientists on these images. 
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However, the duration of these manual diagnoses and the accuracy of the 
diagnosis may vary depending on the person’s experience. Therefore, 
computer-aided studies are needed in this field and there are many papers 
in the literature which have been studied on this subject.!?784° 

Arunachalam and Savarimuthu proposed a computer-aided brain tumor 
detection and segmentation method.’ The proposed system has stages of 
enhancement, conversion, feature extraction, and classification. Brain 
images are enhanced using shift-invariant shearlet transformation (SIST). 
Brain tumor detection is a difficult task because the brain images contain 
large variations in shape and density. Shanmuga Priya and Valarmathi 
focused on edema and tumor segmentation based on skull extraction and 
kernel-based fuzzy c-means (FCM) approach.” The clustering process was 
developed by combining spatial information-based multiple kernels. Sajid 
et al. presented a deep learning-based method for brain tumor segmentation 
using different MRIs.*’ The proposed convolutional neural network (CNN) 
architecture uses a patch-based approach and takes local and contextual 
information into account when estimating the output tag. Patil and Hamde 
proposed a computer-aided system based on monogenic signal analysis for 
the recognition of brain tumor image.” Textural identifiers from different 
monogenic components were obtained using a completed local binary pattern 
and a gray-level co-occurrence matrix. Kebir et al. presented a complete and 
fully automated MRI brain tumor detection and segmentation methodology 
using the Gaussian mixture model, FCM, active contour, wavelet transform, 
and entropy segmentation methods.”? The proposed algorithm consists of 
skull extraction, tumor segmentation, and detection sections. 

As can be seen from the above studies, brain images are difficult to be 
studied due to their anatomy. Therefore, it has been found more useful to use 
many methods together. EAs are also frequently used as alternative methods 
in many studies with brain images. In this chapter, studies using EAs on 2-D 
brain MR images are presented and various results are discussed. 


6.2 BACKGROUND 
6.2.1 EVOLUTIONARY ALGORITHMS AND APPLICATIONS 
The motivation behind the EAs is to computationally utilize biological 


evolution, including mainly the natural mechanisms of life, mutation, 
selection, and more to solve living problems.** 
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EAs are metaheuristic optimization algorithms that work on the concept 
of population. Metaheuristics are a low-level procedure that can perform 
a partial search or high-level procedures aimed at finding, producing, 
or selecting intuitions. They can be applied to a variety of optimization 
problems with limited processing capability and insufficient or incomplete 
information. In such cases, they offer a good enough solution.** EAs 
are part of a more comprehensive set of algorithms (EC-evolutionary 
computation) and are based on random searches and meta-intuition.** 

More iterations are often required for the accuracy of optimized candi- 
date solutions obtained with EAs. However, there is no guarantee that 
more iterations will always reduce the error.*8 

Following are the basic steps of EAs to find the best solution for each 
iteration: 


Step 1: After natural selection, the fitness of the population of indi- 
viduals grows with the effect of environmental pressures. 

Step 2: Each individual is evaluated by using the fitness function given 
by the problem. 

Step 3: Parents are selected according to their fitness values between 
the individuals. 

Step 4: New individuals are also produced from the parents by recom- 
bination (Step 3). 

Step 5: In the selection of future generations, the comparison of eligi- 
bility values of old candidates and new individuals is used. 

Step 6: If the solution error in all operations is greater than expected, 
return to Step | and the iteration is terminated.** 


All EAs work on the common principle of the simulated evolution 
of the individual using selection, mutation, and reproduction processes. 
However, the algorithms may be different depending on the application 
and the forms in which they are used.™4 

Some of the best-known types of EAs are differential evolution (DE), 
differential search algorithm (DSA), genetic programming (GP), evolu- 
tionary programming (EP), evolution strategy (ES), genetic algorithm 
(GA), gene expressing programming (GEP). In addition to the mentioned 
EAs, there are also swarm intelligence algorithms that consider animals as 
swarm samples. Some of these algorithms are the ant colony algorithm, 
bee colony algorithm, cuckoo bird algorithm, particle swarm optimization 
(PSO).* 
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There are many areas in which the EAs are used in the literature. One 
of these areas is the brain MRI processing. In the following section, the 
studies between 2014 and 2019 using 2-D brain MR images and EAs are 
mentioned. 


6.3. BRAIN IMAGE APPLICATIONS AND STUDIES OF 
EVOLUTIONARY ALGORITHMS 


The use of EAs in biomedical image processing studies helps to achieve 
more successful results. Panda et al. used gray gradient information on brain 
MR images for thresholding (2016). Because there are many regions in the 
brain images, thresholding was performed at multiple levels. In the proposed 
method, a 2-dimensional histogram-based gray gradient is calculated 
and thus more edge information is preserved. Gray gradient information 
between pixel values and pixel mean values is used to minimize the loss of 
information. An evolutionary computational technique was used to optimize 
gray gradient information to determine optimal multilevel threshold values. 
For this purpose, a new adaptive swallow swarm optimization (ASSO) 
algorithm has been applied to the images. The performance of ASSO was 
found to be better than swallow swarm optimization (SSO). 

Narayanan et al.*° performed a study on brain tumors (2019). They have 
developed a new algorithm that uses two optimization techniques: PSO and 
bacterial foraging optimization (BFO) to clearly identify tumor regions 
and to segment tissues. Contrast limited adaptive histogram equalization 
(CLAHE) was used for preprocessing of brain MR images, and clustering 
of the contrast-enhanced image was performed with the modified fuzzy 
c-means (MFCM) algorithm. The local best and the global best positions 
in the clustered image are defined by the PSO algorithm, the local best 
parameter helped BFOA to find the best location values from which the 
search would be initiated by BFOA. MFCM used the threshold value of 
BFOA and the best global value of PSO to reassess the clustering result. 
As aresult of these procedures, tissue structures were determined and the 
tumor area was determined using the proposed algorithm PSBFO-MFCM. 
In order to prove the power of the algorithm, evaluation parameters of the 
most advanced techniques were compared and the algorithm has obtained 
more successful results than other methods.**!** 

Sarkar et al. applied a new unsupervised segmentation method on 
natural images and medical images, including brain MR images, to improve 
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the distinction between objects within the framework of multi-objective 
optimization.*® A multi-objective evolutionary algorithm (MOEA) based 
on image segmentation technique using multilevel minimum cross-entropy 
and Rényi entropy has been proposed. MOEA/D-DE (decomposition- 
based MOEA with DE), one of the MOEAs, is used to determine the 
optimal solution set instead of the existing single-targeted optimization 
techniques. The thresholds used in multilevel segmentations were obtained 
from approximate Pareto Fronts (PF) produced with MOEA/D-DE. The 
performance of MOEA/D-DE has been compared to single-target and 
multipurpose optimizers that are inspired by other popular nature. The 
performance of the proposed algorithm was also tested on MR images 
containing brain tumors.!*:'4 

Vishnuvarthanan et al. have developed a method that uses both 
optimization and clustering techniques to identify tumor areas on brain 
MR images.*° In this study, a new modified fuzzy k-means (MFKM) 
algorithm based on Bacteria Foraging Optimization (BFO) is proposed. 
It has been seen that the MFKM algorithm together with BFO improves 
segmentation on brain MR images. Compared to PSO-based FCM and 
FCM techniques, the BFOA-based MFKM method has greatly reduced 
computational complexity. The proposed methodology was evaluated by 
using comparison parameters. 

Agrawal et al. proposed a new method for intracranial segmentation 
with optimum boundary point detection (OBPD) using pixel density 
values of brain MR images.! First, the skull part of the brain was removed 
from the images. Two border points were needed to divide the brain pixels 
into three regions according to their density. (Three regions mentioned: 
gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF)). 
The proposed GA-BFO hybrid algorithm was used to calculate the final 
cluster centers of the FCM method, and thus optimal boundary points were 
obtained. Other soft calculation techniques GA, PSO, and BFO were also 
used for comparison. 

Vishnuvarthanan et al. used BFO and MFCM algorithms in this study.*’ 
BFO was used for optimization and MFCM, the advanced version of the 
FCM algorithm, was used for clustering. Both techniques are well used ina 
single frame for MRI image segmentation, thus, effective tumor detection 
and tissue segmentation were obtained simultaneously. Frequently, the 
parameter setting is not required in the proposed algorithm combination. 
Therefore, since it increases both manual intervention and high time 
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consumption, it is thought that it will facilitate the work of radiologists in 
patient diagnostic procedures, with the support provided by an automated 
algorithm, it is concluded that large volumes of clinical datasets can be 
easily evaluated. 

Chandra and Rao proposed a separate wavelet-based GA to detect 
tumors in brain MR images.'' For enhancement, soft thresholding discrete 
wavelet transform (DWT) and GAs for image segmentation were used. 
First, MR images were enhanced using a discrete wavelet descriptor. Then, 
the GA and unsupervised k-means clustering methods were used together 
to make the most accurate segmentation. A GA was used to determine the 
best combination of information obtained by the selected criterion. The 
method was tested on more than 100 real brain MR images. The developed 
method took advantage of GA’s ability to solve optimization problems 
using a large search area (the label of each pixel of the image). 

In this chapter, Oliva et al. proposed a general method for image 
segmentation.*? This method consists of minimum cross entropy thresh- 
olding (MCET) and crow search algorithm (CSA) methods for image 
thresholding. In the proposed approach, CSA based on the behavior 
of crow swarms was used to estimate threshold values. Cross-entropy 
between classes was minimized by using CSA. CSA encodes a series 
of candidate threshold points as solutions for each generation. Cross- 
entropy is used by the objective function to determine the quality of 
the proposed solution. New candidate solutions are produced using 
predefined CSA operators in accordance with CSA rules and the value of 
the objective function. The segmentation quality increases as the process 
progress. Unlike other optimization techniques used for segmentation 
recommendations, CSA offers better performance and avoids critical 
errors such as early convergence to suboptimal solutions and limited 
exploration—exploitation balance in the search strategy. The proposed 
method, which is a general segmentation algorithm, provides excellent 
results in the automatic segmentation of complex MR images. Statisti- 
cally confirmed experimental results showed that the proposed technique 
achieved better results in terms of quality and consistency. 

Hemanth and Anitha used a modified GA approach to overcome the 
disadvantage of traditional approaches (2018). These three different GA 
approaches were applied to the images during the feature selection stage. 
For all these GA-based methods, the back propagation neural network 
(BPN) was used as a classifier. Appropriate modifications of existing GA 
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have been made to minimize the randomness of conventional GA. The 
study focuses on the development of modified reproduction operators that 
form the core of the algorithm. In this study, different binary processes 
were used to produce offspring during the crossover and mutation process. 
Unlike conventional binary operations, these designed binary operations 
are used in GA for a very random and specific purpose. The application of 
these approaches was examined in terms of medical image classification. In 
this study, abnormal brain MR images of four different classes were used. 
The proposed method provided 98% accuracy compared to other methods. 

De et al. performed an application of the GA-based segmentation 
algorithm to automatically group unlabeled pixels of MR images into 
different homogeneous clusters.'° In this method, information about the 
optimum number of segments before segmentation is not required. With 
the fuzzy intercluster hostility index, the centroid of the different sections 
is separated into active/inactive form. The test images are segmented 
using those selected from these active centroids. Using this method, the 
optimal number of segments and their respective centroids are obtained. 
GA method-based fuzzy intercluster hostility index, automatic clustering 
(ACDE) algorithm using DE and a nonautomatic GA were compared with 
brain MR images in two different anatomies. The comparison showed that 
the GA-based automatic image segmentation method is superior to the 
other two algorithms. 

Jothi and Inbarani developed and implemented a supervised hybrid 
feature selection algorithm called MR tolerance roughset firefly-based 
quick reduct (TRSFFQR) for MR brain images.”! With this intelligent 
hybrid system, it is aimed to take advantage of basic models and to soften 
its limitations. Different categories of properties, that is, shape, density, 
and texture-based properties, are obtained from segmented MR images. 
Hybridization of two techniques, tolerance rough set (TRS) and Firefly 
algorithm (FA) was used to select the necessary characteristics of a brain 
tumor. TRSFFQR was compared with Artificial Bee Colony (ABC), 
Cuckoo Search Algorithm (CSA), supervised tolerance rough set-PSO- 
based relative reduct (STRSPSO-RR) and supervised tolerance rough 
set—PSO-based quick reduct (STRSPSO-QR) in terms of performance. As 
a result, both the efficiency of the technique and its improvements over the 
currently controlled feature selection algorithms were observed. 

Akdemir Akar used bilateral filter (BF) to eliminate Rician noise on MR 
images as edge protection method in this study.* Denoising performance 
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varies according to the selection of BF parameters. For this reason, the 
parameters of BF were optimized by the GA in the study. The importance 
of parameter selection in BF was understood by comparing quality 
parameters such as mean square error (MSE), peak signal to noise ratio 
(PSNR), signal to noise ratio (SNR), and structural similarity index metric 
(SSIM) and noise clearing results with other BFs. Experimental results 
have shown that BF with recommended parameters performs better. 

The level set-based Chan and Vese algorithm is a widely used region- 
based model among active contour models for image segmentation and 
naturally uses density homogeneity in each region. But, in this model, 
when the contour is not initialized properly, the possibility of getting 
trapped in a local minimum is encountered. This problem becomes more 
critical as said density variations can be found in more varieties and scales 
on medical images. Mandal et al.” proposed a version of the Chan and 
Vese algorithm independent of the first selection of the contour (2014). 
In this study, the appropriate energy reduction problem to be solved is 
formulated using PSO technique, which is one of the metaheuristic 
optimization algorithms. The algorithm has been successfully applied to 
both scalar and vector-valued images. Experiments with different types of 
medical images have shown that the proposed method can significantly 
improve the quality of segmentation performance obtained by the Chan 
and Vese algorithm. 

As anew method, Huang et al. introduced a new neighborhood intu- 
itionistic fuzzy c-means clustering algorithm with a genetic algorithm 
(NIFCMGA).” This algorithm has the advantages of a heuristical FCM 
clustering algorithm to maximize benefits and reduce noise/outlier effects. 
GAs were used to determine the optimal parameters of the algorithm. The 
proposed technology has been successfully applied to the clustering of 
different MR and CT image regions that can be expanded to diagnose 
abnormalities. As a result of the comparisons made with other methods, 
the performance superiority of the proposed algorithm was revealed. 

Ding et al. introduced the multi-agent consensus MapReduce opti- 
mization model and coevolutionary quantum PSO with self-adaptive 
memeplexes for designing feature reduction method and proposed a 
multiagent-consensus-MapReduce-based attribute reduction (MCMAR) 
algorithm.'’ First, coevolutionary quantum PSO with self-adaptive meme- 
plexes is designed to group particles into different memeplexes aimed at 
exploring the search area and finding the best region for the reduction 
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of large datasets. Second, the four layers neighborhood radius framework 
with the compensatory scheme was created to divide large property sets 
by taking advantage of the interdependence between multifeatured sets. 
Third, a new multi-agent consensus MapReduce optimization model has 
been adopted to perform multiple-relevance-attribute reduction using five 
types of factors for the implementation of the ensemble coevolutionary 
optimization. Therefore, the uniform mitigation framework of the 
coevolutionary play of different factors under constrained rationality has 
been further developed. Fourth, the approximate MapReduce parallelism 
mechanism was allowed to form the multifactor coevolutionary consensus 
structure, interaction, and adaptation, which developed different factors to 
share their solutions. Finally, extensive experimental studies have proven 
the efficacy and accuracy of MCMAR on some well-known reference 
datasets. Furthermore, successful applications in large medical datasets 
are expected to dramatically increase MCMAR in terms of efficiency and 
feasibility for complex infant brain MRIs. 

Mekhmoukh and Mokrani proposed a new method for image segmen- 
tation based on PSO and outlier rejection combined with a level set.” 
The proposed algorithm is sensitive to whether the image is noisy or 
homogeneous and operates based on the initialization of cluster centers. A 
new FCM version has been developed to improve the outlier rejection and 
reduce noise sensitivity of the conventional FCM clustering algorithm for 
image segmentation. In FCM, the first cluster centers are usually randomly 
selected whereas, in the proposed method, cluster centers were selected 
optimally with PSO. In addition, spatial neighborhood information is 
taken into account when performing the calculations. Test procedures of 
improved kernel possibilistic c-means algorithm (IKPCM) developed in 
the study were applied with synthetic, simulated, and medical images. 
This method was compared with different versions of FCM. It has been 
shown that it provides good segmentation and extraction of various tissues 
and has improved in terms of its robustness to noise. 

Stochastic resonance (SR) is the improvement of low contrast images 
with noise.*! In this study, Singh et al. developed a modified neuron model- 
based SR for brain MR images with Tl-weighted, T2-weighted, fluid- 
attenuated inversion recovery (FLAIR) and diffusion-weighted imaging 
(DWI) sequences.*! The multi-objective bat algorithm was used to adjust 
the parameters of the modified neuron model. The image processing quality 
varies depending on the selection of these parameters. It was observed that 
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the proposed approach performed well in the improvement of MR images, 
and as a result, the difference between gray and WM became apparent. 

Anaraki et al. proposed a method using CNNs and GA to classify 
different degrees of gliomas in a noninvasive manner, which is one of 
the brain tumor types.° GAs were used to determine CNN architecture 
instead of trial and error or to adopt predefined structures. According to the 
results, the accuracy value of the classification of the three glioma grades 
is 90.9%. About 94.2% accuracy was obtained in the study in which the 
tumor types were classified as Glioma, Meningioma, and Pituitary. These 
results showed that the method is effective in the classification of tumors 
on brain MR images and because of the flexibility of the method it can 
help doctors in the early stages of diagnosis. 

Manikan et al. performed segmentation of brain MR images using 
simulated binary crossover (SBX)-based multilevel thresholding with 
real coded GA.*° T2-weighted brain MR images were selected for the 
procedures. The entropy was maximized to achieve optimal multilevel 
thresholding. The algorithms such as Nelder-Mead simplex, PSO, BF, and 
ABF were compared with the results of the proposed algorithm. The results 
showed that the proposed method had better performance for medical 
images and had a more consistent performance than previous methods. 

Kotte et al. applied adaptive wind-driven optimization (AWDO)-based 
multilevel thresholds for brain MR image segmentation.” Images used 
for image segmentation were selected from axial T2-weighted brain MR 
images. In this study, the efficacy of AWDO was not investigated for MRI 
in multilevel thresholds, only a small contribution was made. Optimum 
multilevel thresholding was achieved by maximizing Kapur's entropy 
and between-class variance (Otsu’s method). In order to investigate the 
effectiveness of the algorithm, the comparison was made with algorithms 
such as RGA, GA, Nelder-Mead simplex, PSO, BF, and ABF. The results 
of the comparisons showed the superiority of the segmentation of the 
proposed algorithm. 

Nayak et al.*’ have proposed a new pathological brain detection 
system (PBDS) (2018). They used CLAHE to improve the quality of 
brain MR images. Discrete ripplet-II transform (DR2T) with degree 2 
was then applied to the enhanced images. The PCA + LDA approach has 
been adopted to reduce the large number of coefficients obtained with 
DR2T. As a final procedure, the MPSO-ELM algorithm obtained from the 
combination of modified particle swarm optimization (MPSO) and extreme 
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learning machine (ELM) was used for pathological or healthy separation 
of MR images. The purpose of MPSO in this algorithm is to optimize the 
parameters of hidden nodes in single-layer layered feedforward networks. 
The proposed method and other methods were compared with three 
benchmark datasets. As a result of the comparisons, it has been seen that 
the proposed method improves the classification accuracy and number of 
features. Using the MPSO-ELM algorithm, higher accuracy values were 
obtained than ELM and BPNN classifiers. 

Khorram and Yazdi presented an optimized thresholding method that 
uses the ant colony algorithm for the segmentation of brain MR images.” 
In the algorithm, the textural characteristics of the brain were accepted 
as heuristic knowledge. The algorithm was designed so that the ants’ 
movements can be more than the nearest eight neighborhoods. In this way, 
an increase in the ability to discover ants occurred. The applied method 
showed better performance in post-processing image enhancement based 
on homogeneity. As a result of the experiments performed with axial 
Tl-weighted brain MR images, a higher accuracy value was obtained 
compared to conventional heuristic methods, K-means, and expectation 
maximization. 

Pham et al.*' proposed a new cluster in method for brain MR segmenta- 
tion (2018). For this purpose, firstly, a new objective function has been 
found using utilizing kernelized fuzzy entropy clustering with local spatial 
information and bias correction (KFECSB). Next, an algorithm using an 
improved PSO with a new fitness function is applied to images for better 
segmentation. The performance of the proposed method has been tested 
on a variety of simulated brain MR datasets and real brain MR datasets. 
As aresult of the tests, the method was found to be more effective than the 
other five states of art methods in the literature. According to the results of 
the tests, it is seen that it can provide better performance and better results 
in noisy and inhomogeneous intensity images than other methods. 

Ahmed et al. proposed a hybrid method to classify brain MR images as 
benign and malignant. For this classification process, gray wolf optimizer 
(GWO) and support vector machine (SVM) with radial basis function 
(RBF) kernel methods were combined. As a result of the proposed method, 
the classification accuracy was found to be 98.75%. 

Agrawal et al.’ presented an absolute intensity difference-based 
(AIDB) technique using adaptive coral reef optimization (ACRO) for 
brain MR image thresholding (2017). The intensity difference information 
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in the brain image was extracted from the two-dimensional histogram 
matrix. Since the brain images contain more regions, it is convenient to 
perform multilevel thresholding. Therefore, the AIDB technique is used 
for the proposed method. The ACRO method was applied to the images to 
maximize fitness function. T2-weighted brain MR images were selected 
from the Harvard medical dataset for the test procedure. According to the 
results, it is seen that the proposed technique provides better performance 
than other standard methods. 

Nabizadeh et al.*° proposed an algorithm for segmentation and detec- 
tion of brain stroke and tumor lesions (2014). For this purpose, they used 
the histogram-based gravitational optimization algorithm (HGOA). In 
the algorithm, histogram-based techniques are used to detect initial brain 
segments. Later, the gravitational optimization-based algorithm was 
applied to reduce the number of these segments. Finally, thresholding is 
performed to determine whether it is a tumor or a stroke lesion. In addi- 
tion, the method is not affected by atlas registration, previous anatomical 
information or bias corrections because it works independently of these 
parameters. Accuracy values were 91.5% for the ischemic stroke lesions 
and 88.1% for the tumor lesions. 

In this chapter,’ proposed an optimized method for processing brain 
MR images using morphological filters compatible with the human visual 
system (HVS) (2019). With the logarithmic image processing model, top- 
line and bottom-line morphological operators were combined for HVS 
consistency versus filtering. In the morphological filter application, it 
was necessary to select the structural element with appropriate shape and 
size in order to detect the tumor correctly. However, this process became 
difficult as the shape and structure of the tumor may vary according to 
different stages. Therefore, the structural element was optimized using 
PSO. Results were evaluated according to parameters such as contrast 
improvement index (CII), average signal to noise ratio (ASNR), PSNR, 
and measure of enhancement (EME). According to these results, the 
evaluation parameter values obtained by enhancement using PSO were 
higher than those obtained without using PSO. 

In this chapter, Virupakshappa and Amarapur performed segmenta- 
tion and classification of brain MR images using the Modified Level Set 
approach and Adaptive ANN.°*° For image class estimation, it is crucial 
to extract useful features from the image. The features extracted for 
property extraction in the method were multilevel wavelet decomposition 
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features. The classification was performed with adaptive artificial neural 
network (AANN). Whale optimization algorithm (WOA) was used to 
optimize ANN. This neural network (AANN) provides optimization of the 
network structure and provides better classification results for tumors in 
segmented images. The results obtained using the proposed method were 
compared with the previous methods. About 98% classification accuracy 
was obtained with the proposed method. 

Subashini et al. (2019) have developed a noninvasive method to iden- 
tify the degrees of tumors found in brain MR images.** Median filter and 
pulse coupled neural network was used for preprocessing of images. FCM 
and watershed methods were also applied to the images in the segmenta- 
tion stage. The tumor was separated from the MR image by Sobels’s edge 
detection and morphological operators. Some extraction techniques have 
been applied to the images to obtain various features. About 91% clas- 
sification accuracy was obtained from the system. In addition, this method 
was found to spend less time in brain tumor grade identification. 

Registration, one of the image processing techniques, a lines multiple 
images and acquires an informative new image.* Pradhan and Patra have 
introduced an original method.# The objective of this hybrid method is 
the optimization of similarity measure in intensity-based nonrigid image 
registration. However, this method necessitated the optimization of the 
similarity metric. For this purpose, the bacterial foraging algorithm (BFA) 
was used to find optimum regional mutual information by the P-spline 
interpolation method. However, the calculation time for this process was 
high. Therefore, quantum-behaved particle swarm optimization (QPSO) 
and BFA have been merged. With this combination, the number of param- 
eters to be optimized was reduced and the calculation time was shortened. 

Yang et al.°° presented a wavelet energy-based method to classify brain 
MR images as normal or abnormal (2016). Brain images were classified 
with SVM and weights of SVM were optimized with biogeography-based 
optimization (BBO). According to the sensitivity and accuracy results, 
the performance of BBO-KSVM was superior to back propagation neural 
network (BP-NN), KSVM (kernel SVM), and PSO-KSVM. 

Zhang et al.® have developed a PBDS for brain MRI (2016). For this 
purpose, firstly, the extraction of 12 fractional Fourier entropy (FRFE) 
properties was performed for each of the brain images. The properties were 
then used in a multilayer perceptron (MLP) classifier. The developments 
provided by the MLP are as follows: The first determined the optimal 
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number of hidden neurons by the pruning technique. Of these techniques, 
dynamic pruning (DP), Bayesian detection boundaries (BDB), and Kappa 
coefficient (KC) were subjected to comparison. Secondly, the adaptive 
real-coded biogeography-based optimization (ARCBBO) was used for 
training bias and weights of MLP. The proposed FRFE + KC-MLP + 
ARCBBO method obtained an average accuracy of 99.53%. 

Rajesh et al.“ presented a system for the detection and classification of 
brain tumors (2019). Differential based adaptive filtering (DAF) method 
was used to remove the noise in the images during the preprocessing 
stage. Skull elimination was also performed using erosion. Segmenta- 
tion of tumors was utilized region growing algorithm. Rough set theory 
(RST) has extracted the features of the segmented images. Tumors were 
also trained and tested with particle swarm optimization neural network 
(PSONN) to classify them as normal and abnormal. With PSONN, it is 
aimed to search for training parameters, in other words, decision variables 
for optimization. PSO was also used with an artificial neural network to 
minimize MSE and improve the learning process. 

Lahmiri*® compared three systems to detect gliomas on brain MR 
images (2017). A different PSO technique was used in each of these 
systems for the segmentation of brain MR images. These were classic 
PSO, Darwinian particle swarm optimization (DPSO) and fractional- 
order DPSO (FODPSO). After segmentation, the directional spectral 
distribution (DSD) signature of these images was calculated. The multi- 
fractals of the calculated DSD were then obtained by estimation using 
generalized Hurst exponents. Finally, the classification of these fractal 
features was performed with SVM. The classification accuracy of these 
three systems was evaluated using the leave-one-out cross-validation 
method (LOOM). According to the results, each of the three systems 
performed better than the previous ones. However, it was considered 
that the FODPSO-DSD-multi-scale analysis (MSA) system could be a 
more promising system for the clinical environment because of its high 
accuracy and low processing time. 

Nayak et al.*’ have developed a new PBDS (2018). In this system, 
CLAHE was used to enhance the quality of brain MR images. The features 
were extracted with a two-dimensional PCA (2DPCA) method. A compact 
and discriminative feature set was created using the PCA + LDA combina- 
tion. The combination of modified differential evolution (MDE) and ELM 
methods were used to classify images as healthy or pathological. In this 
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combination, input weights and hidden biases of single-hidden-layer feed- 
forward neural networks (SLFN) have been optimized with MDE. The 
proposed system was compared to three datasets. According to the results, 
the proposed method was superior to its equivalents, and the MDE-ELM 
classifier has better accuracy than conventional algorithms. 

Bose and Mali'® proposed an algorithm that combines FCM and ABC 
for image segmentation (2016). In this algorithm, the fuzzy membership 
function is used to find the cluster centers which are optimized by ABC. 
Compared with other optimization techniques such as PSO, GA, and EM, 
this new algorithm (FABC) was found to be more efficient. FABC did 
not depend on the selection of initial cluster centers and provided better 
performance in terms of convergence, time complexity, robustness, and 
segmentation accuracy. These have eliminated the disadvantages of FCM. 
The algorithm has become more efficient by utilizing the random proper- 
ties of ABC to initialize cluster centers. GA, PSO, EM, and the proposed 
algorithm were applied to synthetic, tissue and medical images, including 
brain images. As a result of these applications, the effectiveness of the 
proposed algorithm has been proved. 

Kauretal.”’ presented a multilevel thresholding method for the automatic 
segmentation of lesions on brain MR images (2018). In the method, density 
and edge information found in GLCM and image histograms were used to 
calculate multiple thresholds. In order to reduce the high computational 
complexity resulting from search methods, the fitness function had to be 
optimized. For this purpose, a mutation-based particle swarm optimization 
(MPSO) technique was used. Also, the search capabilities of this method 
are better than the conventional version. The performance comparison of 
the proposed method was performed according to these three different 
measures. According to the measurement results, the proposed method 
performed better than the other competing algorithms. 

SZ is one of the most important brain diseases worldwide. Most of the 
analyses are performed according to volumetric measurements on brain 
MR images. These measurements differ according to the heterogeneity 
of SZ.*' Therefore, in this study, the links between schizophrenic MR 
images and typical images were examined by Manohar and Ganesan.*! 
Texture features such as Hu moments, gray level co-occurrence matrix 
(GLCM), Zernike moments, and structure tensor have been used to repre- 
sent specific pattern changes in schizophrenic MR images. The distinc- 
tion between healthy images and schizophrenic images was achieved by 
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using the binary particle swarm optimization (BPSO)-based fuzzy SVM 
(FSVM) classifier with mutual information quotient as the objective 
function in the feature selection stage. The skull portion of brain MR 
images was stripped with a nonparametric region-based active contour. 
According to the results, it is seen that the proposed method can better 
separate the brain region from the skull compared to other methods. 
About 90% accuracy was obtained with BPSO-FSVM. It has a better 
classification than BPSO-SVM. 

Mishra et al.** presented a new model for brain tumor detection and 
classification (2019). An improved fast and robust FCM algorithm has 
been developed as a segmentation algorithm to smoothen images and to 
reduce noise in brain MR images. The gray level co-occurrence matrix 
technique was used to extract features from the images, and these prop- 
erties were then used in the modified adaptive sine cosine optimization 
algorithm—particle swarm optimization (ASCA—PSO)-based LLRBFNN 
model, which was proposed for benign, malignant tumor classification. 
By optimizing the weights of the LLRBFNN model with the MASCA-— 
PSO algorithm, manual detection of radiologists was avoided. When 
the comparisons made with different models and classification accuracy 
values are examined, it was seen that better results are obtained with the 
proposed model. 


6.4 RESEARCH CHALLENGES AND PROSPECTS OF 
EVOLUTIONARY ALGORITHMS 


In this chapter, the studies between 2014 and 2019, in which EAs were 
applied to 2-dimensional brain MR images, were examined. In order to 
analyze the effects of these studies on brain MR images, the studies were 
compared according to the methods used, publication years, datasets and 
accuracy rates. 

EAs have been used in many different stages in the processing of brain 
MR images. All tables between Table 6.1 and Table 6.7 provide infor- 
mation about which methods are used in which stages. In Table 6.1, the 
studies using PSO are listed by year and the stage of use. It is examined 
that PSO is used in most of the image processing stages. In some studies, 
the PSO algorithm is used in a combination with other EAs. In some other 
studies, some modified versions of the PSO algorithm were presented. 
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TABLE 6.1 Brain MR Images Processing which are Used in Studies Using PSO. 


Year Reference Method(s) Stage of use 
2019 [36] PSOBFO Segmentation 
2014 [29] PSO Image denoising 
2018 [17] PSO Feature reduction 
2015 [32] PSO Segmentation 
2018 [37] MPSO Classification 
2018 [41] PSO Segmentation 
2019 [9] PSO Image filtering 
2015 [43] BF QPSO Image registration 
2019 [44] PSONN Feature extraction 
2017 [28] classical PSO, DPSO, or FODPSO Segmentation 
2018 [22] MPSO Thresholding 
2018 [31] BPSO Classification 
2019 [34] MASCA-PSO Feature extraction 


In the evaluation of the DE algorithm in Table 6.2, it was found that 
this algorithm was only used in two studies in 2017 and 2018, respectively. 
In one of these two studies, the algorithm was used for the segmentation 
and the other study used it for classification. 


TABLE 6.2 Brain MR Images Processing which are Used in Studies Using DE. 


Year Reference Method(s) Stage of use 
2017 [48] MOEA/D-DE Segmentation 
2018 [37] MDE Classification 


The evaluation of the BFO algorithm is given in Table 6.3. Similar 
to PSO, BFO has been used in conjunction with other EAs in some 
studies. The stages in which this algorithm is used are segmentation and 
thresholding. 

According to the evaluation in Table 6.4, GA was used in many studies 
similar to PSO. This algorithm has been utilized in almost every stage 
of the studies. In related studies, it has been found that GA is used in 
combination with other methods. In this table, the studies in which the 
modified versions of GA are used are also given. 
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TABLE 6.3 Brain MR Images Processing which are Used in Studies Using BFO. 


Year Reference Method(s) Stage of use 

2017 [56] BFO Segmentation 
2014 [1] GA BFO Segmentation 
2018 [57] BFO Segmentation 
2019 [36] PSO BFO Segmentation 


TABLE 6.4 Brain MR Images Processing which are Used in Studies Using GA. 


Year Reference Method(s) Stage of use 
2016 [11] GA Segmentation 
2018 (Hemanth et al., 2018) Three GA combinations Feature selection 
2016 [4] GA Image denoising 
2016 [15] GA Segmentation 
2015 [20] NIFCMGA Segmentation 
2019 [6] GA Classification 
2014 [30] Real coded genetic algorithm Segmentation 
2014 [1] GA BFO Segmentation 
2019 [46] MedGA (Medical image Thresholding 
preprocessing based on GAs) 

2014 [5] GA Preprocessing 
2017 [27] GA Feature selection 
2019 [8] GA Feature selection 
2019 [33] GA Segmentation 
2018 [50] GA Feature selection 
2019 [52] GA Feature selection 


ACO, another algorithm evaluated in Table 6.5, is also included in 
various studies. When these studies are examined, it is determined that 
this algorithm is mostly used in the segmentation and thresholding stages. 
In addition, it is sometimes used in conjunction with other algorithms and 
sometimes it is used in a new form with various changes in it. 

The BBO algorithm in Table 6.6 was used in two studies. In these 
studies, this algorithm is included in the classification stage. 

Table 6.7 provides information on two separate studies using CSA. The 
stages in which CSA was used in the studies were obtained as thresholding 
and image enhancement which are the preprocessing stages. 
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TABLE 6.5 Brain MR Images Processing which are Used in Studies Using ACO. 


Year Reference Method(s) Stage of use 

2019 [25] ACO Thresholding 

2016 [10] FABC (Fuzzy-based artificial bee colony Segmentation 
optimization) 


TABLE 6.6 Brain MR Images Processing which are Used in Studies Using BBO. 


Year Reference Method(s) Stage of use 
2016 [59] BBO Classification 
2016 [60] ARCBBO Classification 


TABLE 6.7 Brain MR Images Processing which are Used in Studies Using CSA. 


Year Reference Method(s) Stage of use 
2017 [39] CSA Thresholding 
2017 [18] CSA Image enhancement 


The EAs given in Table 6.8 illustrate the stage of use of these algorithms 
in the related studies. According to Tables 6.1—6.8, it is obvious that GA and 
PSO are the most used algorithms in the mentioned years. The algorithms in 
these tables have often been combined with other methods. In some studies, 
the methods have been modified and used instead of their original form. 


TABLE 6.8 Brain MR Images Processing which are Used in Studies Using Other 
Evolutionary Algorithms. 


Year Reference Method(s) Stage of use 

2016 [21] FA Feature selection 
2017 [51] Bat optimization (BO) Image enhancement 
2019 [3] GWO Classification 
2017 [2] ACRO Thresholding 
2014 [35] HGOA Segmentation 
2018 [55] WOA Classification 
2016 [53] SFLA Feature extraction 
2016 (Panda et al., 2016) ASSO Thresholding 
2019 [16] Social group optimization (SGO) Thresholding 
2018 [26] AWDO Segmentation 
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Table 6.9 shows the comparison of the studies by publication years. As 
a result of this comparison, it is observed that studies including EAs that 
are performed on brain MR images are mostly published in 2016, 2018, 
and 2019. 


TABLE 6.9 Comparison of Studies by Years. 


Year Publishing Total 
2014 [1], [29], [30], [35], [5] > 
2015 [20], [32], [43] 3 
2016 (Panda et al., 2016), [11], [15], [21], [4], [53], [59], [60], [10] 9 
2017 [48], [56], [39], [51], [2], [28], [18], [27] 8 
2018 (57], (Hemanth et al., 2018), [17], [26], [37], [41], [55], [37], 11 
(22], [31], [50] 
2019 [6], [25], [3], [44], [34], [16], [46], [8], [33], [52], [9] 11 


In Table 6.10, studies are compared by datasets. Harvard Medical 
University, BrainWeb Simulated, and several hospital databases are the 
most preferred. 

The accuracy ratio obtained from the studies that were performed on 
the same dataset in the same image processing stage is also compared. The 
databases that meet these criteria are the Harvard Medical University and 
BRATS 2015. The image processing step according to the same criteria 
isdetermined as classification. 

In Table 6.11, datasets, classification methods, and accuracy rates of 
the studies are given. 


6.5 FUTURE RESEARCH DIRECTIONS 


According to the results obtained from the studies, it can be mentioned 
that EAs have an important impact in this field. These algorithms 
have helped other methods used together in these studies and provided 
better results. There are many examples where it is sometimes used in 
conjunction with other methods or other EAs. In addition, it is observed 
that these algorithms are used by making changes in their original states. 
In this way, better results could be obtained from studies with hybrid or 
modified models. 
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TABLE 6.10 Datasets Used in the Related Studies. 


Dataset Publishing 


Harvard Medical University (Panda et al., 2016), [4], [30], [37], [3], 
[2], [59], [28], [37], [34], [18], [50], [52] 


36], [56], [57] 
36], [56], [1], [57], [39], [4], (32), [41] 


Harvard Brain Web Repository 
BrainWeb Simulated Database 


BRATS 16] 

BRATS 2015 3], [55], [22] 
BRATS 2013 36], [57] 
BRATS 2012 1, [25], [22] 
Internet Brain Segmentation Repository (IBSR) [1], [32], [41] 


Several Hospital Database Hemanth et al., 2018), [53], [44], [22], 


16], [46], [5], [50], [36] 


SOS eR eS SS SS 
K 
oo 


National Cancer Institute (NCI) 21] 
National Institute of Health (NIH) 29] 
IXI Dataset [6], [5] 


REMBRANDT Dataset [ 
TCGA-GBM Data Collection [ 
TCGA-LGG Dataset [ 
Neuroimaging Tools and Resources (NITRC) [35] 
Whole Brain Atlas (WBA) (WBA 2019) [ 
IBSR 2019 [ 

[ 


Three benchmark datasets, namely, DS-I, 
DS-II, and DS-III 


National Alliance for Medical Image [31] 
Computing (NAMIC) database 

ISLES2015 [16] 
SICAS Medical Image Repository [27] 


The use of these algorithms on difficult-to-work images, such as brain 
MRI, has helped improve the results obtained. One reason for the difficulty 
of working with these data is to create a useful dataset. Sometimes this 
difficulty can be caused by the fact that the actual datasets are not available 
or that they do not have the necessary competence to be processed. This has 
a significant effect on the performance impacts of the proposed methods. 
Therefore, the researchers are faced with the situation of scanning many 
sources for a proper dataset and creating the dataset accordingly. 
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TABLE 6.11 Comparison of Studies Using Other Datasets in Terms of Classification 
Methods and Accuracy Rates. 


Dataset Publishing Classification Accuracy rate 
IXI, REMBRANDT, [6] CNN-GA 94.2% 
TCGA-GBM, TCGA-LGG 
Medical School of Harvard = [37] MPSO-ELM 100% 
University (DS-66) 
Medical School of Harvard = [37] MPSO-ELM 100% 
University (DS-160) 
Medical School of Harvard = [37] MPSO-ELM 99.69% 
University (DS-255) 
Medical School of Harvard = [3] GWO-SVM 98.75% 
University, BRATS 2015 
MICCAL, BRATS 2015 [55] WOA-ANN 98% 
Harvard Medical University [59] BBO-KSVM 97.78% 
database 
Open access dataset [60] BBO-MLP 99.53% 
Government Medical College [44] PSONN (Particle Swam 96% 
Hospital, Trivandrum, India Optimization Neural 

Network) 
Medical School of Harvard = [34] MASCA-PSO 99.875% 
University (Dataset-160) 
Medical School of Harvard [34] MASCA-PSO 99.61% 


University (Dataset-255) 


Because of all these difficulties, alternative methods are being consid- 
ered to create a dataset. Generative adversarial network (GAN) is one of 
the alternative methods for creating a dataset for this purpose. With GAN, 
a certain number of real images can be passed through various training 
phases and original and realistic datasets can be obtained. In this way, a 
unique dataset can be created both original and without the need for a lot 
of resources. 


6.6 CONCLUSIONS 


In this chapter, the studies that use the EAs in the processing of 2D brain 
MR images are examined. As a result of the studies, it has been found that 
EAs are utilized in many image processing stages such as segmentation, 
tumor detection, feature extraction, and classification. When the details of 
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these stages are examined, it is seen that EAs are used in a hybrid way 
with other methods. In addition to the original versions of these algo- 
rithms, various modified versions have been included in the studies. In the 
mentioned studies, these algorithms are generally used for optimization 
and improvement of other methods. This optimization has sometimes 
helped to determine the optimum parameters of the method in which it is 
used together, sometimes to improve the classification performance and 
sometimes to obtain more accurate results. As can be seen from the studies 
examined, EAs have significantly contributed to the studies performed with 
brain images in the field of biomedical image processing as in other areas. 
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ABSTRACT 


The scene graph is used to represent the semantics of images or visual 
understanding. It has been used frequently for image retrieval and image 
generation tasks. We develop a scene graph generator tool from a single 
image. This tool creates a scene graph representing Thai language. The main 
methodology contains three steps: image captioning, scene graph parser, and 
machine translation. We propose an application of chatbot demonstrating 
the use of the generated scene graph data. The application is developed 
using Dialogflow. The response is in a JSON form which can be applied 
further. The image is submitted and the scene graph is generated. Then, the 
sentence is translated into Thai and by using PyThaiNLP library, the parts of 
sentences are changed into Thai language. We also show the metric values 
of the machine translator and caption generator. For the translator model, we 
use BLEU, GLEU, WER, and TER scores. 


7.1 INTRODUCTION 


Scene graph proposed by" is one type of the graphs, representing the rela- 
tions between objects inside the image. In the graph, each node represents 
the objects in the image. The leaf node can be physical, geometric, or 
material depending on each object type. We can use the scene graph to 
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represent the image semantic. It is useful for image captioning, image 
generation'* image retrieval,” etc 

In Figure 7.1, an image is shown along with the caption “a black and 
white dog playing with frisbee.” In Figure 7.2, the example of scene graph 
based on Figure 7.1 is shown. The scene graph contains the details of the 
objects, relations, and attributes that are extracted from the image and its 
description. “Dog” and “frisbee” are objects while “plays” is a relation. 


FA i Cen 
7k Ne Volpe tg ‘ 
en) pane a 8 sas ois of i ea hi a, Nae 
Retrieved from:L doglab.com 


a black and white dog playing with a frisbee . 


FIGURE 7.1 Example image with caption. 


A scene graph generator is created from understanding semantic at 
levels such as sentences or images. Some researchers used more than one 
semantic level in their work to improve the accuracy of the generator. 
Considering the sentence levels, most of the current tasks are based on 
English language. Localizing the graph in other languages, such as Thai 
language has not been found yet. In this chapter, we demonstrate the 
development of the scene graph generator that can be applied to the Thai 
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language. The derived scene graph data set can be alternatives for Thai 
developers to create tasks such as Thai image captioning applications. The 
method contains the following steps, given an image as an input of the 
generator. The input image is given to the caption generator and the output 
sentence is fed to the scene graph parser to put the information in the scene 
graph format. Finally, the scene graph in English language is translated to 
Thai language by a neural machine translator. 


Object 


Dog, Frisbee White 
Dog 

Relation | Black 

Dog plays frisbee Plays 

Attribute | 

Dog is white Frisbee 

Dog is black 


FIGURE 7.2 Scene graph example. 


7.2 BACKGROUND 


The scene graph is the graph structure which describes relations or attributes 
between two objects.'’ There are various ways to develop the scene graph 
generator.*° The first approach is to generate captions using convolutional 
neural network (CNN) and recurrent neural network (RNN).!*”° Then, the 
caption is used to be the input of the graph parser to convert into the scene 
graph (). The second approach is to use object detection and use attribute 
extraction as well as relation extraction to generate the scene graph. 
The generator utilizes the object detection scheme and then uses feature 
extraction to convert the information into the scene graph.’ 

For two basic tasks such as object detection, and object recognition, 
COCO data set is one of popular data sets utilized.** CNN is the popular 
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network used in developing image recognition tasks including face 
recognition,’ land use classification,” object recognition,!! etc. Various 
kinds of CNNs include AlexNet,'° GoogLeNet, ResNet,’ etc. For the 
object recognition task, the known network is RCNN whose performance 
was improved to be Fast-RCNN® and Faster-RCNN.” There are many 
researches improving the performance of object detection models such as 
single shot detector (SSD)"’ likes Faster-CNN and YOLO.”! 

Li et al. presented Factorizable Net which considers subgraph-based. 
Subgraphs are generated from the fully-connected graph where the edges 
can refer to similar phrase regions.'’ In their previous work,'® they gener- 
ated the scene graph from all objects and relations from an image by using 
their novel neural network model which is called MSDN model. Yang 
et. al.*° created a graph RCNN which consists of three steps: object node 
extraction, relationship edge pruning, and graph context integration. These 
works!**° considered visual gnome as data set for training.'° 

Xu et.al. applied RNN and use iterative message passing to improve 
the scene graph prediction. They predicted both objects and relationships 
based on visual gnome data set (Xu et.al, 2017). The model takes an 
image as an input, then generate RPN proposals which are passed through 
their inference model, RNN. The RNN contains GRU cells for nodes and 
edges connected to each other. The messages are passed through these 
nodes, pooled and sent to the next nodes. Ref [29] presented Relationship 
Context-InterSection Region (CISC). They focused on the intersection 
region of object bounding boxes for feature extraction. The intersection 
may imply the interactive parts among objects.* The approach is based on 
RNN which utilizes memory and message passing. 

Since scene graph generation requires lots of training data, especially 
relationships between objects in images, large effort is needed to label 
relationships. The relationship labels in the data set are usually missing. 
Chen et.al. (Chen et al., 2019) proposed a generative model to predict 
scene graph labels with limited labels in images. The approach is based on 
semi-supervised learning. The approach can be used to create scene graph 
labels and predicate classification. 

Table 7.1 represents the different inputs and outputs of these previous 
works which create the scene graph. Currently, the scene graph data set 
supported only in English language such as Visual Gnome. This data set 
contains a lot of object information on an image like a coordinate of each 
object. However, the data set in the scene graph format still has no support 
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in other languages. In our chapter, we utilize the existing scene graph 
generator and augment the steps for translation into a local language. 


TABLE 7.1 Previous Scene Graph Generation Approach. 


Model Input Output 

F-Net [17] Image and RPN proposal Object and relation 

MSDN [18] Image, object proposal and —- Caption and scene graph 
Region proposal 

RePN, aGCN [30] Image Attentional graph, 


convolution network 


Iterative message passing Image Scene graph 
(Xu et al., 2017) 


7.3 METHODOLOGY 


Figure 7.3 describes the overall steps of this research. The scene graph 
generator is made up of three elements: caption generator, scene graph 
parser, and translation machine.** First, the caption generator model from 
Show and Tell A Neural Image Caption Generator, which is a public 
research project of on Github”® is used. The structure of the caption 
generator includes image encoder, a deep convolution neural network 
which is initialized from Inception_v3 checkpoint and hidden layers 
like Long Short-Term Memory (LSTM). As an initialization for caption 
generator model, we use image caption generator’’ based on COCO 2014 
data set. We use caption 2014 and image 2014 for training and testing data 
set, evaluation (256 records, 4 and 8, respectively). 

Inception _v3 checkpoint is used as a pretrained weight and then the 
training is done with COCO 2014 dataset. The model is trained with 
1,000,000 epochs, which takes around 1 week on our machine with the 
following specification 8-Core Intel(R) Xeon(R) CPU E5-2680 0 @ 
2.70GHz, RAM 126 GB, Harddisk 1 TB, 2 Tesla K40c Memory Usage 
12206MiB Power 235W. 

The scene graph parser is used!’ to convert the English sentences into 
the scene graph. The scene graph format includes relations and attributes in 
JSON format. Stanford Scene Graph Parser is used, the model implemented 
to support rule-based parser and classifier-based parser. At this point, 
the machine translator is used. In our case, Py-translate is selected. The 
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library supports python language and is connected to the Google translator 
API. Structure of Google translate consists of Google’s Neural Machine 
Translation (GNMT) System which is based on LSTM layers. 


Image 
JPEG format 


v 


Caption Generator 


Image to Caption 


v 


Caption 
Text 


v 


Scene Graph Parser 
Caption to Scene graph English Language 


v 


Scene Graph English Language 
Scene Graph 


Translator Machine 


Scene graph English Language to Scene Graph Thai Language 


v 


Scene Graph Thai Language 
Scene Graph 


FIGURE 7.3. Overview of preprocess data procedure. 


Note that there are two alternatives in applying the translator depending 
on which may affect the accuracy.*' For the first approach, the translator 
machine takes the scene graph which is output from scene graph parser as 
an input like a word-for-word translation in the top figure of Figure 7.4. 
In the first step, the sentence is separated into the list of word. Then, each 
word is put into the translator machine. The last, resulting words are 
mapped into a result sentence. The second option is to apply the translator 
from the sentence which is the direct output from the caption generator 
like sentence-for-sentence translation as in the bottom of Figure 7.4. 
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[ Dog, plays, a ball ] 
[ auy, lau, aNuUaa | 
auulauanuoa 


Word-by-word translation 


Dog plays a ball. 


auulauanuaa 


Sentence-by-sentence translation 


FIGURE 7.4 Example of each approach of translation step. 


7.4 EVALUATION 


Different evaluation metrics are used for each model. Then, we calculate 
the overall system score by weight sum. For captioning, we use the 
accuracy based on Microsoft COCO Caption Evaluation module’ on the 
evaluated performance of caption generator. COCO val 2014 is used as a 
testing set which includes 4369 records. 

The evaluation modules include metrics: BLEU, METEOR, ROUGE, 
and CIDEr represents in Table 7.2. For the machine translator, we use 
NLPMetric.’ Sub-module’s NLPMetric is SPICE, GLEU, WER, and TER. 

Bilingual Evaluation Understudy (BLEU) is the measure tool which 
counts the number of words overlap in resulting translation and compares 
with the number with the ground-truth translation applied to N-grams. 
GLEU, also called Google-BLEU, is the minimum of BLEU precision and 
recall applied to N-gram. Recall is calculated by the number of matching 
N-grams divided by the number of total N-grams. Word error rate (WER) 
is used in speech recognition for counting substitutions mainly, calculated 
from number of error words in the predicted sentence compared with 
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the reference sentence. Translation edit rate (TER) counts the number of 
edited words which are words deletion, addition, and substitution. The 
score calculated from the minimum number of edits divided by the average 
length of reference text. 

For Microsoft COCO Caption Evaluation, include BLEU, METEOR, 
ROUGE-L, CIDEr, and SPICE. Metric for Evaluation of Translation with 
Explicit Ordering (METEOR) is the harmonic mean of weighted unigram 
precision and recall which includes stemming and synonym matching. 

Recall-Oriented Understudy for Gisting Evaluation(ROUGE), remodels 
from BLEU adding more attention to recall than precision by paying atten- 
tion to N-gram. Consensus-based Image Description Evaluation (CIDEr) 
measures the similarity of resulting sentences against a set of a ground truth 
sentence by focusing on the sentence similarity by the notions of gram- 
maticality and saliency. Semantic Propositional Image Caption Evaluation 
(SPICE) is the F1-score of scene graph tuples. 

Our experiment includes two options for using a translator machine. 
Thus, the evaluation module receives an input data set for two ways. The 
evaluation module gets a predict sentence from a word-for-word transla- 
tion. The other one is a sentence-for-sentence translation. 

The module takes a predicted Thai sentence that was a resulting 
sentence from translator machine and then, compares with a reference 
sentence. In addition, we use a Thai language parser or the tokenize tool 
from PyThaiNLP to separate a predicted sentence into the sentence with a 
space in between word before feeding to the evaluation model. 

The results presented in Table 7.3 is based on TALPCo.'° TALPCo 
project was developed based on the main language like Japanese and then 
this language translated to other Asian languages. The data set translated 
into English is done by Japanese undergraduate students who had studied at 
an international junior school and it is rechecked by native British English 
speaker. The second version of this project supports Thai language. The 
data set was rechecked correctly by Thai major student at Tokyo University. 

Only the first 100 records for evaluating data set are used. The evaluating 
data set is preprocessed. Our preprocessing step removes the character like 
a dot from a sentence. Some example of TALPCo data set are “There is a 
tree in the park.” which is translated to Thai as “Hau'lwod uaauasisas” 

From Table 7.2, the highest value for the caption generator is CIDEr 
which is 0.996. The second is 0.720 from BLEU-4. In translator model, 
from Table 7.3, the highest evaluate value is WER which is 5.000. The 
second is 1.1082 from TER average with the second approach. 
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TABLE 7.2 The Result of Evaluation from Microsoft COCO Caption. 


BLEU METEOR ROUGE CIDEr SPICE 
BLEU-1 BLEU-2 BLEU-4 BLEU-4 
0.320 0.419 0.552 0.720 0.258 0.538 0.996 0.183 


TABLE 7.3 The Result of Evaluation from NLPMetric. 


Model BLEU avg GLEU WER avg TER avg 
Model-1 0.0000 0.0825 5.000 1.1082 
Model-2 0.0480 0.2096 4.000 0.8122 


The overall system score calculated from two parts giving equal 
weights. First, we use CIDEr score for a leader of our caption generator 
score and GLEU score be a leader for our translator machine score. 
Overall system score for our first approach, a word-for-word translation, 
is 0.53925, the second, a sentence-for-sentence translation, is 0.6028. 

Github of the code and results are available at https://github.com/Bell001/ 
scene-graph-project.git. The code contains implementation example divided 
into folders: 


--CoreNLP 
--application-captures 
--helper 
--measures-model 
|--translator-machine 
|-- NLPMetrics 
|--test 
|--TALPCo 
|--translate-word 
|--process-model 
|--python-packages 
|--test-more 
|--Trans_data_result 


In the folder ““CoreNLP,” it contains scene graph parser'’ derived from 
https://nlp.stanford.edu/software/scenegraph-parser.shtml. We derived the 
pretrain model and adopted the deployment sample from it. 

We develop the python packages in “python-packages” used for Thai 
language manipulation. It contains the script for Thai word splitting, 
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translating an English sentence to Thai sentence, and for testing the 
translation. 

Folder “helper” contains the source code in java script that make the 
translation for the scene graph in JSON file obtained from the scene graph 
parser previously to Thai words using the python package above. We gather 
the list of relations in Thai words in a dictionary used for the conversion. 
ThaiNLP is used to Thai word splitting here after the conversion. Then the 
Thai scene graph is saved in an output JSON file. 

Folder “measures-model” contains the code derived from’ to compute 
the score of the translator machine. It also keeps the data set from TALPCo 
in various language including Thai used in the translation code. 

The code for the whole process is in folder “process model.” It contains 
steps presented as a shell scripts in the folder. 


1. Process image which takes an input image into the caption gener- 
ator model?’ based, returns the English caption file and output the 
English captions in a text file. 

2. Take the English caption text file and input to the scene graph 
parser'’ which generates the scene graph JSON file. 

3. Take the scene graph JSON file and translate into Thai scene graph 
JSON file. 


7.5 APPLICATION 


The chatbot application is implemented to demonstrate the usage of 
the scene graph generator. The chatbot developed by using Dialogflow 
connected with Facebook messenger. Figure 7.5 shows how chatbot 
works. In this application, the test sentence for the chatbot does not 
need to be grammatically correct. We focus on the keywords on the test 
sentence. If the test sentence contains the keyword on the Dialogflow, the 
word can be changed. The model with mapping image to the scene graph, 
the size, color channel, format, and color mode of an image affects the 
result of the caption generator. Currently, our model still supports only 
JPEG format image. 

From Figure 7.5, we test with the sentence: “Tell me the meaning of 
image” option. The sentence is sent to the webhook server which connects 
with our model. After that, the webhook replies a scene graph and caption 
of the image to Dialogflow as a response to the user. 
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uvannmnnseing 


@D  Wondsdusito ut sorqy Wu Format (ad uct} 


s/images/2018/07/cat eating tf 
x0 0100 croo-scale ing 


udannweinninsoaqald ai 
[ 


{ 
“relationships”: [ 
{ 


“text”: [ 
“Pu”, 
“ean”, 
“don” 2 
}, 
“predicate”: "qn", 
“object”: 1, 
“subject”: O 
} ; 
1, ee 
“url": "“coco2014", 
“objects”: [ 


“names”: [ 
"Pau" 


"attributes": [], 
"phrase": "val", 
"id": O 


3) 
© [eatioartss | 


FIGURE7.5 Messenger Chatbot user response with scene graph and sentences (Example I). 
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In the Figure 7.6, at (1), the image to create the scene graph is submitted. 
Then the chatbot replies in JSON format. The responses contain objects 
and relationship where each word gets translated in Thai. In (3), the Thai 
sentence is response meaning as “a closed look of a cat drinking from the 
cup” or “mazar ndveaauanaueinda” in Thai. 


Gai eile http:/3.bp blogspot.com/- 
iLJFUN3TIKI/Uus37pqal5l/AAAAAAAAQus/ 
OZhU278R7xs/s1600/picnic_day-b.ipg 


1) 


., uvaAAnnonmwnsDr2AQalA 
Wa : 


“relationships”: [ 
{ 
“text": [ 
a 
2) Bae 
1, 
“predicate”: “2D2", 
“object”: 1, 
“subject”: O 


1, 
“predicate”: ™ 
“object”: 2, 
“subject”: O 


3) 


“names”: [ 
“Tazfiueims" 
] 
3 
1, 
“attributes”: 0, 
“phrase": “val", 
"id": O 
3 


nqueauniesou laAsAuatm 


FIGURE7.6 Messenger Chatbot user response with scene graph and sentences (Example II). 
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In Figure 7.6, the image submitted at (1) is response with (2) containing 
three objects and two relationships. The whole sentence is translated as “a 
group of people sitting around the table” or “nquauitiasou) Iazeiwis” in 
Thai. 


7.6 CONCLUSION 


We present a method for Thai Scene graph generation and the usage on 
the chatbot. Scene graph contains objects and relations extracted from 
the given image. The steps contain (1) image captioning (2) scene graph 
parser (3) Translator machine. The performance is measured for each 
step and the overall score is computed by the sum of all scores. From our 
experiment, there are two approaches to use the translator machine. The 
overall score from a sentence-for-sentence translation gives a higher score 
than a word-for-word translation. The translator evaluation score implies 
how correctly the system can translate. The second approach yields better 
performance as a translator machine. 

The results, scene graph in Thai language, show that our scene graph 
model generation contains a limitation about the accuracy of the scene 
graph in Thai language. Caption generator is the mainly sub-model that 
has main impacts on the result. Our model uses the best sentence, which 
is output from the caption generator, to convert be a scene graph. In our 
experiment, this model still cannot cover general information due to the 
limited training data. With this model, the demonstration works on a small 
group of detected objects and their relations in the image derived by the 
COCO data set. If the larger data set is available, the approach can be used 
to generate sentences with larger classes of objects. 


KEYWORDS 


¢ chatbot 

* scene graph 

¢ deep learning 

¢ caption generation 


162 Computer Vision and Recognition Systems 


REFERENCES 


1. Chantrapornchai, C.; Duangkaew, S. In Handbook of Research on Deep Learning 
Innovations and Trends; A. E. Hassanien et al., Eds.; IGI Global: Hershey, PA, 2019; 
pp 40-58. 

2. Chen, X.; Fang, H.; Lin, T. Y.; Vedantam, R.; Gupta, S.; Dollar, P.; Zitnick, C. L. 
Microsoft COCO Captions: Data Collection and Evaluation Server, 2015. http:// 
arxiv.org/abs/1709.01507 

3. Chowdhary, C. L. Linear Feature Extraction Techniques for Object Recognition: 
Study of PCA and ICA. J. Serbian Soc. Comput. Mech. 2011, 5(1), 19-26. 

4. Chowdhary, C. L. Intelligent Systems: Advances in Biometric Systems, Soft 
Computing, Image Processing, and Data Analytics; Apple Academic Press, 2019. 

5. Chowdhary, C. L.; Acharya, D. P. Segmentation and Feature Extraction in Medical 
Imaging: A Systematic Review. Procedia Comput. Sci. 2020, 167, 26-36. 

6. Chowdhary, C. L.; Goyal, A.; Vasnani, B. K. Experimental Assessment of Beam 
Search Algorithm for Improvement in Image Caption Generation. J Appl. Sci. Eng. 
2019, 22(4), 6911698. 

7. Cunha, G. (n.d.). NLPmetrics. https://github .com/gcunhase/NLPMetrics7 

8. Girshick, R. B. Fast R-CNN, 2015. CoRR, abs/1504.08083. http://arxiv.org/ 
abs/1504.08083 

9. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition, 
2015. CoRR, abs/1512.03385. http://arxiv.org/abs/1512.03385 

10. Hiroki, N.; Okano, K.; Wittayapanyanon, S.; Nomura, J. Interpersonal Meaning 
Annotation for Asian Language Corpora: The Case of TUFS Asian Language 
Parallel Corpus (TALPCo). Proceedings of the Twenty-Fifth Annual Meeting of the 
Association for Natural Language Processing, 2019. 

11. Hu,J.;Shen, L.; Sun, G. Squeeze-and-Excitation Networks, 2017. CoRR, abs/1709.01507. 
http://arxiv.org/abs/1709.01507 

12. Johnson, J.; Gupta, A.; Li, F. F. Image Generation from Scene Graphs, 2018. CVPR 
2018. 

13. Johnson, J.; Krishna, R.; Stark, M.; Li, L.; Shamma, D. A.; Bernstein, M. S.; Fei-Fei, L. 
Image Retrieval Using Scene Graphs. In 2015 ieee Conference on Computer Vision and 
Pattern Recognition (cvpr), 2015; pp 3668-3678. DOI: 10.1109/CVPR.2015.7298990 

14. Khare, N.; Devan, P.; Chowdhary, C. L.; Bhattacharya, S.; Singh, G.; Singh, S.; Yoon, 
B. SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid 
Classifier Model for Intrusion Detection. Electronics 2020, 9(4), 692. 

15. Krishna, R.; Zhu, Y.; Groth, O.; Johnson, J.; Hata, K.; Kravitz, J.; Fei-Fei, L. Visual 
Genome: Connecting Language and Vision Using Crowdsourced Dense Image 
Annotations, 2016. https://arxiv.org/abs/ 1602.07332 

16. Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet Classification with Deep 
Convolutional Neural Networks. In Advances in Neural Information Processing 
Systems 25; Pereira, F., Burges, C. J. C., Bottou, L., Weinberger, K. Q., Eds.; Curran 
Associates, Inc., 2012; pp 1097-1105. 

17. Li, Y.; Ouyang, W.; Zhou, B.; Shi, J.; Zhang, C.; Wang, X. Factorizable Net: An 
Efficient Subgraph-Based Framework for Scene Graph Generation. ECCV, 2018. 


Chatbot Application with Scene Graph in Thai Language 163 


18. 


19. 


20. 


21. 


22. 


233 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


Li, Y.; Ouyang, W.; Zhou, B.; Wang, K.; Wang, X. Scene Graph Generation from 
Objects, Phrases and Region Captions. ICCV 2017, 2017. 

Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. E.; Fu, C.; Berg, A. C. 
SSD: Single Shot Multibox Detector, 2015. CoRR, abs/1512.02325. http://arxiv.org/ 
abs/1512.02325 

Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep 
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring. 
Comput. Commun. 2020. 

Redmon, J.; Divvala, S. K.; Girshick, R. B.; Farhadi, A. You Only Look Once: 
Unified, Real-Time Object Detection, 2015. CoRR, abs/1506.02640. http://arxiv.org/ 
abs/1506.02640 

Ren, S.; He, K.; Girshick, R. B.; Sun, J. Faster R-CNN: Towards Real-Time Object 
Detection with Region Proposal Networks, 2015. CoRR, abs/1506.01497. http:// 
arxiv.org/abs/1506.01497 

Schuster, S.; Krishna, R.; Chang, A.; Fei-Fei, L.; Manning, C. D. Generating 
Semantically Precise Scene Graphs from Textual Descriptions for Improved Image 
Retrieval. Proceedings of the Fourth Workshop on Vision and Language, 2015. 
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S. E.; Anguelov, D.;. Rabinovich, 
A. Going Deeper with Convolutions, 2014. CoRR, abs/1409.4842. http://arxiv.org/ 
abs/1409.4842 

Tsung- Yi, L.; Michael, M.; Serge, B.; James, H.; Pietro, P.; Deva, R.; Lawrence, Z. C. 
Microsoft Coco: Common Objects in Context. In Computer Vision — eccv 2014; Fleet, 
D., Tomas, P., Bernt, S., Tinne, T., Eds.; Springer International Publishing: Cham, 
2014; pp 740-755. 

Tsutsui, S.; Kumar, M. Scene Graph generation from Images, 2017. http://vision.soic. 
indiana.edu/b657/sp20 1 6/projects/stsutsui/paper.pdf. 

Vinyals, O.; et al. Show and Tell: Lessons Learned from the 2015 MS-COCO Image 
Captioning Challenge. IEEE Transac. Patt. Anal. Machine Intell. 2016, 39(4), 652-663. 
Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and Tell: A Neural Image Caption 
Generator, 2014. CoRR, abs/1411.4555. http://arxiv.org/abs/ 1411.4555 

Wang, Y. S.; Liu, C.; Zeng, X.; Yuille, A. Scene Graph Parsing as Dependency Parsing. 
NAACL 2018, 2018. 

Yang, J.; Lu, J.; Lee, S.; Batra, D.; Parikh, D. Graph R-CNN for Scene Graph 
Generation. ECCV 2018, 2018. 

Yngve, V. H. Sentence-for-Sentence Translation. Mechanical Translation 1955, 2(2), 
29-37. http://www.mt-archive.info/MT-1955-Yngve.pdf 

Zhang, C. Deep Learning for Land Cover and Land Use Classification (Doctoral 
Dissertation), 2018. DOI: 10.17635/ lancaster/thesis/428 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


CHAPTER 8 


Credit Score Improvisation through 
Automating the Extraction of Sentiment 
from Reviews 


AADIT VIKAS MALIKAYIL?, MAHESWARI R.”, AZATH H.°, and 
SHARMILA P.* 


12VIT Chennai, Chennai, India 
3VIT Bhopal, Bhopal, India 
4sri Sairam Engineering College, Chennai, India 


“Corresponding author. E-mail: maheswari.r@vit.ac.in 


ABSTRACT 


Credit rating firms like D&B, A.M Best Company, etc., usually give scores 
to companies based on their bank records, scanning the failure to repay 
the loan, etc. since they only look into the financial details of whether 
the company defaulted or not in repaying their loans. Text/sentimental 
analysis improves decisions made by the banks before lending loans to 
their customers. Also enables businesses to grow profitably by providing 
information-based intelligence tools. The mission of the work has been 
to extract the unstructured data from websites (i.ce., Glassdoor, Indeed) 
housing company reviews. The objective is to automate the extraction of 
the aspects and their corresponding sentiments and cumulate a credit score. 
This proposed prototype will accept a text input manually or via a text file 
stacking review. These reviews will be tokenized into words and catego- 
rized by noun and adjective. The adjectives are assigned the respective class 
values/polarity (binary form). The entire goal was to make use of company 
information stored on the Internet, since it was unaccounted. This kind of 
information has been extracted from public websites like kanoon.com, 
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Glassdoor, Indeed.in, etc. So, the rating now is not only based on the bank 
records but also on how the company operates its employees, sanitation 
issues, the pay problems if any, beneficial perks given to the employees’, 
etc. Even the consumer whoever given review about the company perfor- 
mance in the market also considered for processing. The sentiments/adjec- 
tives given to all noun forms are recorded and given binary score values. 
The cumulative score of a sentence or a paragraph is then presented in a 
database. Then a pivot table is generated, which displays a frequency table 
of the noun forms and their respective sentiment used to describe them. 
The number of times a noun form has a positive/negative sentiment gets 
recorded and a score get displayed to the user. Accuracy values for both the 
text analysis algorithms have been analyzed, and the best one, that is, the 
TextBlob Analyzer has been put to use since it had accuracy values above 
95% for positive sentiments and 91% for the mixed sentiments. 


8.1 INTRODUCTION 


Businesses make it a good habit of checking the credit score of a company; 
for instance, when they either invest in the company or purchase shares. The 
credit score gives enough data to check company compliance with assets and 
taxes.'* However, this motivates to peruse a proposal that there is so much 
information on the company data on websites like Indeed.in, Glassdoor, 
etc. This unstructured data can be used to bring more value to credit rating 
agencies (CRA’s) valuation. It is true that these available unstructured data 
can be put to use for further analysis. However, websites like Glassdoor do 
not allow python packages to scrape their pages. Another alternative is using 
Google Chrome extensions that scrape portions of these pages and save it 
in a CSV file. But this mere scraping is not efficient for the use case.** The 
reason is that the system needs to collect people opinion of a company to 
analyze of the company’s worth qualitatively within a year or two.** Figure 
8.1 shows the sample screen of Glassdoor bans access to scraping. 


8.1.1 OBJECTIVE 


This work takes into consideration the free, unstructured data and using it 
for scoring. Sometimes investors do not trust the credit score provided by 
the CRA. 
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E:\softwares\envs\py3\lib\urllib\request.py in error(self, proto, *args) 

568 if http_err: 

569 args = (dict, ‘default’, ‘http_error_default') + orig_args 
--> 570 return self._call_chain(*args) 

571 

572 # XXX probably also want an abstract factory that knows when it makes 


E:\softwares\envs\py3\lib\urllib\request.py in call chain(self, chain, kind, meth_name, *args) 
502 for handler in handlers: 
503 func = getattr(handler, meth_name) 

--> 504 result = func(*args) 


505 if result is not None: 
506 return result 


E:\softwares\envs\py3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs) 
648 class HTTPDefaultErrorHandler(BaseHandler): 
649 def http_error_default(self, req, fp, code, msg, hdrs): 
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp) 
651 
652 class HTTPRedirectHandler(BaseHandler): 


HTTPError: HTTP Error 403: Forbidden 


FIGURE 8.1 Glassdoor bans access to scraping. 


One reason is that, companies believe that the score might not be updated 
to the current date, and that, it could have tampered. Hence, this work looks 
at getting the most recent data and generating a score that could add value 
to the general credit score. The proposed system uses python and its Natural 
Language ToolKit (NLTK) corpus to perform text/sentiment analysis. It made 
use of data frames from the Pandas package. It has helped to create tables out 
of lists and dictionaries. This scrutinizes and provides better access to it when 
the results are computed. The PorterStemmer from the NLTK package is used 
to stem down words. For instance, “dread,” “dreadful,”’ and “dreadfulness” 
will be considered as word “dread” while computing “dread” as a sentiment. 
Thus, the mission of the work has been to extract the unstructured data from 
websites (i.e., Glassdoor, Indeed.in) housing company reviews. The objective 
is to automate the extraction of the aspects and their corresponding sentiments 
and cumulate a credit score. The score produced will help to provide value to 
the credit score generated by the CRAs. 


8.2 PROPOSED SYSTEM PLANNING 


Firstly, the system needs to take care of the organization of the scraped 
data. The unstructured data collected needs to be perfectly ordered since 
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the text sentiments describe the particular aspects. Then there comes a 
need to determine the sentiment, followed by the consideration of interac- 
tion factor. Figure 8.2 shows the data flow for the calculation of accuracies 
from the training text files. The proposed system achieves accuracies from 
two training text files such as pos_sentiment.txt and mixed_sentiment.txt 
which has been trained using sentiment analyzer such as Naive Bayer and 
Rule-Based. 


Creation of 


two training 


| I 


Trained under Trained under Vader 


TextBlob Sentiment (Rule-Based) 


Accuracies 


FIGURE 8.2 Calculation of accuracies data flow. 


In this system, the user is required to input the number of pages he/she 
would like to scrape. Once the required pages are set as input, the automa- 
tion of pages begins, thus the latest reviews will provide valuable insight. 
Once the scraping is done, the next step is to perform the sentimental 
analysis on the file containing the scraped data. This system was designed 
to help aid bring up the value of some CRAs offer. As seen noticeably 
in some literature surveys, the investors do not entirely trust the CRAs 
completely.’”* The data collected/reflected might be sometimes a couple 
of years old when the company indeed had a good/bad credit score.’ 
Otherwise, the agencies end up giving incomplete information that is not 
very useful to predict the company future.'*!! The user can take advantage 
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of the automated scraping to collect and analyze data for practically any 
company from the many registered in the website indeed.com.'? °° 


8.3. SYSTEM DESIGN 


The system used Python 3.6 and other necessary packages. This design is 
intended to add value to the traditional methods of credit score calculation. 
With a lot of thought process put into action, the solution makes use of 
the freely available unstructured data available on websites like Indeed, 
Glassdoor, etc. The system now helps users to get an insight on the 
company performance not only quantitatively, but also qualitatively. This 
work includes the use of two test files in which one file contains full 
stacked repositories of positive sentiment and the other is stacked with 
test data for mixed sentiments. These data were collected by automating 
the scraping process by arranging the text reviews by descending order 
for the positive reviews and the arrangement for the mixed reviews was 
obtained by arranging the reviews by ascending order. These two datasets 
are used to get appropriate accuracies for the two algorithms used, namely 
TextBlob and Vader Sentiment Analyzer. Proceeding toward the sentiment 
analysis module, the system has imported the packages required for both 
the mentioned algorithms. 


8.3.1 VADER SENTIMENT ANALYZER 


The Vader Sentiment Analyzer package gives output for any given text, 
mostly in float data type with values ranging from negative “—1” to positive 
“+1,” wherein the parameters are “pos,” “neg,” “neu,” “compound”. This 
work, however, uses the compound value for better estimation. The reason 
is that a sentence can have a mix of sentiments on one or more aspects. The 


sample set of compound classification is given in the examples column. 


Examples: 
¢ Ido not like the work experience here, but I am pleased about 
the salary. 


¢ The salary is not that good, but the free food menu does cease 
to surprise me. 
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So a score for sentences like these does not deserve a highly positive 
score or a negative score. Hence, the system makes use of a compound 
score that helps to attain a net score value that would do justice to the 
output. Further, the graph is used to represent the polarity scores that 
have been collected over a period of time (as in the Indeed.in website). 
Adding to that, the proposed system has made sure that the reviews that 
are scraped are only posted in the current financial year. This was done 
to make sure the scoring is done in recent basis, since too old data may 
hamper the results. 


8.3.2 TEXTBLOB 


Now, consider TextBlob algorithm which works in a way that it produces 
two kinds of scores, namely the subjectivity score and the polarity score. 
The subjectivity score helps to identify the number of lines that the text 
file has to be opinionated and how many of them are related to facts. The 
polarity score such as positive or negative as the name suggested will give 
the user an idea of whether the text is positive or negative. Similarly, a 
graph/plot is made using these polarity values. The diagrammatic repre- 
sentation of score analysis using TextBlob is shown in Figure 8.3. 


Cumulative Cumulative 
Sentiment Score Subjectivity Score 
Visualized over a Visualized over a 

graph graph 


FIGURE 8.3 Score analysis using TextBlob. 
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8.4 IMPLEMENTATION OF THE POSITIVE SENTIMENT REVIEWS 


The training dataset for positive sentiment reviews is considered at first. 
All these data were scraped from Indeed.in website keeping the ratings 
descending order as specified by the URL. So accordingly, the soup was 
created. The screenshot of the created soup is represented in Figure 8.4. 


Soup = make soup("“https://inm. indeed. co, in/cnp/Sansung-Electronics-9/revieusstart="+str(page nun)+&sorterating acs’) 
# for the next button 
nxt_btn = soup, find(attes={"class"s "cnp-Pagination-link cnp-Pagination-Link--nav"}) 


"1 etnfnana nuim)\ 


nnintl "thie de nana 
| positive.train.bt - Notepad 
File Edit Format View Help 
b'Resolved vendor management queries and did trouble shooting with respect to supply chain processes, 

Learned how to effectively communicate with stakeholders and establish a rapport with them. 

Workplace is fun and there's good work life balance. .very good place for work. 

friendly environment. worked as a team, everyone is always ready to help each other. 

productive work and gives very good pay. overall everything is good. Every day is Day 1 at Amazon. 

You learn something new each day.Very Sophisticated workplace. 

Big offices, great cafeteria food, cab benefits, yearly bonuses, good managers and a fun place to consider it as a 2nd home. 

It's a huge family..I would like to tell all about Amazon that its a big brand company. 

It is think a lot about their customers as well employers. The atmosphere of the Amazon is friendly. 

Everyone treats to each other like a friend. Love you Amazon & Team. Productive with amiable environment. 

The fun part was interacting with the customers to know more about their day to day problems and some positive feedbacks about amazon ar 
It a good place to brush your skills. I had great time working there. 

Its flexible they are strict abt leaves. You need to book leaves in prior if you know the dates. 

The Amazon is a good place to work with so talented people and to enhance our knowledge and by working there you will improve your workir 
Awesome place to work and learn. The work culture in amazon is great.People are always work driven and love learning and teaching new ski] 
Life changing experience, Everyday is day zero, you learn everyday,management was helpful & they actually see the positive part in a persc 
The work culture is flexible and open-minded.2. The management takes care of the company needs in an efficient manner, 

The number of hours of work do not exceed the daily hours of operation unless you require to work over time. 

Lot of benefits in accordance with the amount of work and quality of work we do. Biggest retailer. Best place to work. 

Attractive Salary packages in India. Work Culture is great, and high opportunities to grow in the company itself.’ 

The work environment of the company is very good.People feel proud to be part of the company provide full support to their employees in f 
A good place to work over all with a friendly environment. Flexible to work hours though little long. 

Good cafetaria though varities are less but tastly friendly environment , good & well trained employee , also senior are very helpfull, 
Anazing place to work.Amazon is an organisation's which treat it's employees very well. 

Amazon offers great learning and developing opportunities to its employees and also give opportunities to work onsite (overseas). 

Good environment to job in Amazon Good team work.sopting to our team and TL and supervisor . 


FIGURE 8.4 Created soup for the positive sentiments text file. 


Sample reviews with the best rating appearing at first are shown in 
Figure 8.5. The screenshot of the reviews in ascending order with the 
least rating reviews showing first. This is the code for the scraping of the 
reviews for training in ascending order. 
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Found 1,176 reviews matching the search See all 6,958 reviews 


4.0 productively and fun workplace 
SEC. (SAMSUNG EXPERIEANCE CUNSULTANT (Former Employee) — INDIA , CHENNAI — October 
8, 2019 


KKK 


what ever, ther have stress, pressure, but above that i enjoyed the job..i learned lot new 
technologies updated and customer mind set.. management was favurable to staff.. but 
target is the hardest part of the job.. overall i like the job.. 


Was this review helpful? Share Report 
Yes No 
4.0 Good 
took* Technician (Former Employee) — Puttur, Karnataka —- September 20, 2019 


Good training provide and good timing working hours friendlly treat the company . 
give uniform id card ok happy one year completed 
Thank you samsung 


Was this review helpful? Share Report 
Yes No 1 
3.0 Fun Workplace 
tok Software Engineer (Current Employee) — Noida, Uttar Pradesh — September 19, 2019 


Since it is flexible in work timings and you have to leave laptop at office, a great work 
life balance is maintained. But then, work from home is not there 


FIGURE 8.5 Positive user reviews. 


Next, the system looks at the review scraped in ascending order for the 
mixed sentiments. The sample screenshot of the reviews is shown in Figure 
8.6. Further, the prototype looks at the accuracy obtained by using the Naive 
Bayes algorithm with TextBlob. Out of two TextBlob measure parameters 
such as the polarity and the subjectivity, this system used the polarity to 
obtain the accuracies. The subjectivity can be used to determine how many 
fact-oriented or opinion-oriented sentiments are existing in the input file. 


8.5 RESULTS AND DISCUSSION 


As it observed, the first pair of results are the accuracies for the two approaches 
which was used to calculate the sentiment scores, namely TextBlob and Vader 
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Sentiment Analysis. The accuracy attained expending TextBlob module with 
positive accuracy of 95.95% over 1309 samples and mixed sentiment accu- 
racy of 19.59% via 1118 samples is shown in Figure 8.7. 


Sort by: ( Helpfulness Date ) Language { Any wv ot 


Found 1,176 reviews matching the search See all 6,958 reviews 


1.0 Worked as Engineer 


borstsry! Engineer (Former Employee) — Noida, Uttar Pradesh — November 27, 2017 


| worked at Samsung India Electronics full-time 
Pros 

- Relaxed work environment 

- Good bus/food facility 

- Location is good 

- Work/Life balance is awesome 

Cons 

- Less learning as a Software Developer 


Was this review helpful? c!) Share 
Yes6 | No3 ) 


1.0 Not a good organization to work with. 
wires Regional Manager (Current Employee) — Chennai, Tamil Nadu — August 11, 2017 


Not a good organization to work with. No direction. No work ethic. No thought proces 
Forward thinking is lacking. Discrimination in various form. Overall useless 
management. 


v Pros 
Brand name 


x Cons 


FIGURE 8.6 Mixed user reviews. 


Using the Vader Sentiment Analyzer, the system got the following 
accuracies represented in Figure 8.8. It shows the accuracy achieved 
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paying Vader Sentiment Module with positive accuracy of 89.83% over 
1309 samples and mixed sentiment accuracy of 69.85% through 1118 
samples. It is inferred that the TextBlob module ensures better accuracy 
whereas the accuracy is reduced little bit with Vader Sentiment Analyzer 
as it follows rule-based approach. 


26 print("Mixed sentinent accuracy = 4/% via { samples” tormat(neg correct/neg. 


Positive accuracy = 95,9S110771581368 via 1309 sanples 
Mixed sentiment accuracy = 91,592128801431134 vie 1118 samples 


FIGURE 8.7 Accuracy output using TextBlob. 
PPINT( MLX€d SENTIMENT accuracy = {fe Vle {f SAMPLES .TOrMAat(neg correctyneg col 


Positive accuracy = 89,83957219251337% via 1309 samples 
INixed sentiment accuracy = 69,85688729874776% via 1118 samples 


FIGURE 8.8 Accuracy output using Vader Sentiment. 


As it can be observed, the polarities for the mixed sentiments are a little 
to the positive side, closer to zero. The reason for this is that as reviews 
were scraped for training, it was observed that many employees in spite of 
giving a bad rating for the company also provide good pointers to compen- 
sate and to keep their identity safe. 


8.5.1 ACCURACIES CALCULATED 


The output of the scraped file that was created by automating the scraping 
process keeping in mind that reviews only of the current year are scraped 
to ensure data quality is shown in Figure 8.9. Next, the system getting 
ahead and measures the accuracies for both the approaches used. Making 
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a simple comparison, the system can make out that the accuracy for Text- 
Blob is more than that of Vader. 


‘indeed 

Fle Edt Fommat View Hel 

beng a frester, I got to learn so nany things, ONO has that work culture and exposure where you get opportunity to experience and grow, It's an expand 
\x07 Haintain monthly sales tracker, plan and evecute market site visits alxef\xE2\v07 Individually responsible to contribute tovards meeting the assignec 
‘The nanagenent and GI,HM ane sane, they only see thelr benefits not staffs problens, Once you set with your property, Gl Hl will Kick you out of the prope 
eamnt a lot in the company abovt sales and marketing, Also, learned to cope up with tine and meeting sales tanget which were assigned to me ‘b'It was a pl 
then you should apply to this company, tanagenent 4s not so good especlally numbed headquarter Gut you will receive your salary on tine, 'b'Béing a fresher, 
0'yo company has good to freshers enployee, Every enloyee has best carrer in oy0, Oyo roons has wary opportunities to give the our enployee, Oyo conpar 
ant course please please dont ever try dn OVO, better v have to join in brand property 4 know salary is natter for the brand property but un future will t 
conpany believes firmly on the statenent of "Change" 4s the lav to progness- Salary & benefits could be considered good for sure lot so good ofthe recr 
possible naner in budgeted segent of Hotels across the worl, Right now I am working in OYO (Weddingz)as a VENUE MAAGER and it's @ good company to ur 
you show your potentials, HR are rude with Jow understanding of employee orientation. ‘b'Long hours,Heet very smart people and Jean new things fron then. 
Cult and rigorous incone tax proot submissions, Conmunication fron the H during hiring and post-hiring is nat that good; they never told we it 1s a 6 days 
I] just direct us to call one person and again that person to another person and Zt just goes on either manager wil take care of the staff nor general 
r disliked working at the companyEnsure any facts you include in your review are true and accurate’b'Days and range fron being very peaceful to downright 
r) and changed the nane to oyo\xe\xBO\xab\yeZ\xB0\xa6, Hotel on asking then to give back the google business to us they have dened, they ane parasites ¢ 
al of property ouners ho reach out to O10 using voice or enail nediun, Post establishing elieibility, vardous data points are collected & verified onlin 
lds largest hospitality company, "b'Working with a tean and head of the department was supportive, Use evaluative the RCA. by the team weekly meeting with 
ny packages but no long term future unless you are in @ managerial position, long working hours'b'lo joo security, unrealistic expectations on enployees « 


FIGURE 8.9 Reviews vs scraping. 


The reason is that with Vader it uses a rule base implementation for 
Sentiment Analysis. On the contrary with TextBlob, the system uses Naive 
Bayes classifier algorithm, which is more efficient. This is the scraped 
review for performing the sentiment analysis on. The score from these data 
is used to create graphs and the consumer a complete insight by showing 
how often the graph reaches the positive and negative peaks. 


8.5.2. VADER SENTIMENT VISUALIZATION 


The graph produced by the Vader Sentiment Analyzer is shown in Figure 
8.10. Notice that the peaks have blunt edges over a range of values. The 
graph shows peak values at extreme positive and negative polarity values. 
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With “0” marked as the centre of the y-axis, this makes the output even 
more apparent. 


0.6 
0.4 


0.2 


Polarity 


0.0 
-0.2 
-0.4 


-0.6 
0 50 100 150 200 250 


Days 
FIGURE 8.10 Vader Sentiment Analyzer. 


8.5.3 TEXTBLOB VISUALIZATION 


Figure 8.11 represents the graphical outcome produced with the TextBlob 
package using the Naive Bayes classifier algorithm. It is observable that 
in the graphical representation, the positive and the negative peaks are 
pointed, thus giving us more accurate results. 

These data were collected keeping in mind every time the year the user 
is scraping it on. For instance, if the user is scraping it in the year 2019, 
only those reviews in the particular year will be scraped as the user enters 
the page numbers in the multiples of 20. Figure 8.12 represents the output 
achieved through the dates and page number of the reviews scraped. 


8.5.4 SCALE OF SUBJECTIVITY OF REVIEWS 


The sentiment module reads lines from the text file indeedreviews.txt. 
Further, it looks at the sentimental analysis module using two algorithms, 
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one that is rule-based, Vader Sentiment Analyzer and the other that works 
on top of the Naive Bayes classifier, called the TextBlob classifier. 


Polarity 


100 


0.75 


0.50 


0.25 


0.00 


-0.25 


-0.50 


-0.75 


0 20 40 60 


FIGURE 8.11 = TextBlob Analyzer. 


Enter a number in multiples of 2@ : 


this is page @ 

October 16, 2019 
/cmp/Oyo/reviews?start=20 
October 27, 2019 
/cmp/Oyo/reviews?start=20 
October 17, 2019 
/cmp/Oyo/reviews?start=20 
October 14, 2019 
/cmp/Oyo/reviews?start=20 
October 14, 2019 
/cmp/Oyo/reviews?start=20 
October 10, 2019 
/cmp/Oyo/reviews?start=20 
October 9, 2019 
/cmp/Oyo/reviews?start=20 
October 6, 2019 
/cmp/Oyo/reviews?start=20 
October 4, 2019 


100 120 


40 


FIGURE 8.12 Output—Dates and page no. of the reviews scraped. 
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Figure 8.13 shows an example of the subjectivity of the text. Wherein, 
the values close to 0.0 are objective and the values close to 1.0 are more 
subjective. 


[[0.5, 0.0, 0.0, @.3, 0.0, 0.3, 0,6000000000000001, 0.6000000000000001, 0.6, 0.0, 1.0, 0.3, 0.06666666666666067, 0.3, 0,0666666 
6666666667, 9.0, 0.0, 0.5816666666666666, 0,5416666666666666, 1.0, 1.0, @,6000000000000001, 0.5, Oud, 0.0, 0.5, 0.0, 0.0, 0.55, 
0,0, 0.0, Qed, 0.55, O,A5A5U545454545453, 0.0, 0.0, 0.75, 0.4, 0.625, 0.75, 0.75, 0.75, 0.0, O,A5U54545454545453, 0.0, 0.0, 0. 
1, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.125, 1.0, 0.0, 0.0, 0,06666666666666667, 0.0, 1.0, 0.0, 0.375, 0.4, 0.600d000000000001, 0.3, 
0,6000000000000001, 0.0, 0.8666666666666667, 0.0, 0.5, O,ASASU545454545453, 0,45454545454545453, 0.75, 0.0, 0,9666666666666667, 
0,6, 0.75, 0.0, 140, 0,65, 0.3, 160, 0.3, 03, 0.5, Od, 049, 0.5, 0.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.0, 0.0, 0.3, 0.600 
9000000000001, 1.0, 0.5, 0.9666666666666667, 0.0, 0.9, 1.0, 0.0, 0.0, 0.0, 0.6000000000000001, 1.0, 0.0, 0.5, 8.0, @,6000000000 
000001, 0.0, 0.0, 0.0, 0.2, 6.6000000000000001, 0.75, 0.0, 0.0, 0.2, 0.75, 0.0, 0.6000000000000001, 0.6, 0.0, 0, 1es666e66666666 
66, 0,3333333333333333, 0,3333333333333333, 0.45454545454545453, 0.0, 0.0, @.0, 0.6000000000000001, 0.3, 0.5, 0.3, 0.0, 0.0, 0 
4, 0.0, 0.0, 0,6000000000000001, 0.6666666666666666, 0.3, 1.0, 0.5, 0.2, 0.0, 0.3, 0.5, 0.0, 0.375, 0.125, 0.0, 0.0, 0.0, 00, 
0,A5454545454545453, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.5, 0.0, 0.0, 0.0, 0.6000000000000001, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 

0.5, 1.0, 0.3, 0.2, 0.5, 0.5, 0.0, 0.0, 10, @.0, 0.95, 0.3333333333333333, 0.4, 0.375, 0.5, 0.0, 0.0, 0.0, 6,60000d0d0d090002, 
0,0, 0.0, 0.6000000000000001, 0.0, 0.6, Od5454545454545453, 0,0, 0.0, 0.4, 0.0, 0,6000000000000001, 0.8888888888888888, 0.0, 

0.0, 0,6000000000000001, 0.1, 0.0, 0.0, 0.25, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, Od, 0.0, 0.5, 0.45454545454545453, 0.0, 0,60000000 


AnnnnaanA =A -1799999999999979 A 1 A AA AIM AINEMAINNT AA A CAAAAAAAAAAAAANA fA ACACACACACACACACYD AY AA AA 


FIGURE 8.13 Text subjectivity sample. 


8.5.5 SCALE OF POLARITY OF REVIEWS 


The appropriate analysis is made using the polarity values received at 
output and shown in Figure 8.14. For instance, the system takes the word 
“great” for further analysis. As observed, the picture gives probabilistic 
values for the polarities and subjectivity. The compound values are the 
ones which are going to make use of in the proposed analysis. Again, the 
system would probably not use Vander approach since it gives a lesser 
accuracy as compared to the TextBlob module. Figure 8.15 represents the 
scale of polarity review sample. 


8.5.6 LIMITATION 


The proposed system does not have a login module since this was done 
keeping in mind the fact that the functionality it provides for scraping and 
text analysis. The system was made to aid and bring value to the credit 
scores calculated by the CRAs. As of now, the proposed work only assists 
in scraping all company reviews from Indeed.in website. The limitation 
of this system is that it is still not capable of scraping through websites, 
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namely Glassdoor.com, since the access is forbidden and does not permit 
python packages to do the same. Secondly, the project does not look at 
data integrity as a priority. It does not limit the amount of data to scrape, 
but data integrity is not considered. Large volumes of freely available 
unstructured data are collected for analysis. Adding permissions to these 
files would not be a necessity. 
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FIGURE 8.14 Polarity vs subjectivity. 
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FIGURE 8.15 Scale of polarity review sample. 


8.6 CONCLUSION AND FUTURE WORK 


For inference, it is safe to say that the proposed system will bring some 
value to the customers looking for legitimate investments. More apt infor- 
mation can be gained from the outputs. Once the peak values from the 
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graphs can be obtained, all values can be recorded. An estimate calculation 
can be made. Many companies doubt the credit worthiness of an organiza- 
tion given the efficiencies of the CRAs in presenting company compliance 
to various parameters. A cumulative of many credit reports generated on a 
company helps generate a company credit report that determines the finan- 
cial health of an organization. These reports are created to know the credit 
worthiness of an organization. Accuracy values for both the text analysis 
algorithms have been analyzed, and the best one, that is, the TextBlob 
Analyzer has been put to use since it had accuracy values above 95% for 
positive sentiments and 91% for the mixed sentiments. 
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ABSTRACT 


Automatic driving of a car also proclaimed as the driverless car is perhaps 
the most fascinating and challenging research of the next decade. Even 
though the automotive industry has been transformed radically toward 
automation and remarkable advancement in all the spheres has been 
realized; however, automatic car driving remains a distant dream in the 
present era. This task has been put under third-generation artificial intel- 
ligence innovation due to the underlying complexity and legal aspects in 
case of failure. The task of driving on the road is completely a human 
pursuit; hence, the project involving automation of this task requires the 
complete spectrum of human ability consisting of all the senses and the 
motor organs. It involves a series of tasks to be automated. The first of 
the activity is known as detecting the driving traffic lane automatically. 
The task becomes challenging due to irregular and inconsistent road 
conditions. In this chapter, a vision-based sensing and detecting the road 
are experimented by capturing the front view with the help of a camera 
mounted on the car. In another embodiment, vehicle detection which is 
crucial for a driving system to identify is proposed. In this chapter, we 
have implemented a vehicle identification from the image captured by a 
camera fitted at the front of the vehicle. 
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9.1 INTRODUCTION 


A system facilitating information about track condition, nearby vehicle 
position, and on-road pedestrians is considered as an assisting tool en route 
toward entire or partial automation of driving task. Hence technically, the 
task is known as lane detection which consists of? subtasks: localization 
of road, identifying vehicle position, and analyzing the position of the 
vehicle relative to the road. It also includes localizing possible obstacles 
on the path. We have identified few literatures in this regard; Massimo 
(2000) discussed the infrastructure need for a vehicle to be intelligent; 
they are sensors, machine vision, and actuators. Besides it throws light 
on state-of-the-art technology pertaining “Automatic Road Following.” 
Furthermore, it critically reviews the working of several vision-based 
systems: SCARF, PVR HI, RALPH, ROMA, GOLD, ARGO, and many 
more. A road consists of multiple driving lanes, and it is highly necessary 
to mark the pathway on which the vehicle to be rolled. Image processing 
technique has been found to be quite efficient in this regard® which is an 
integrated approach for dual task to detect the road boundary along with 
lane marking. Further, lane detection has been addressed by the Caney 
edge detector.’ Right and left line detection has been executed by standard 
Hough transformation within a fixed search area. This works very well for 
straight as well as slightly curved road.*7 

In other hands, vehicle detection technique includes background 
subtraction,* feature-based methods, and frame differentiating by taking low 
resolution aerial images of cars.! Further, vehicle detection from satellite 
imagery has been studied by Gill and Sharma’? reveals the total number 
of vehicles within the desired space of the satellite image by employing 
morphological image processing, segmentation, and edge detection with an 
accuracy of 86.5%. 

In order to implement the project comprising of lane detection and 
vehicle detection, we propose to employ convolutional neural networks 
(CNN) which are applied for analyzing visual imagery.'° Here, we intend 
to use it for recognizing images of vehicles and non-vehicles and also to 
locate their position. The problem is modeled as a binary classification 
task (vehicle/non-vehicle). The model is designed in such a way that it 
undergoes training by a small sample (e.g., 64 x 64 x 3) coupled with a 
mono-feature convolutional layer (1 x 1) at the top, the output of which 
is counted as probability value for classification.'*’? Once the model is 
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trained, the input frame’s width and height dimensions (width and height) 
are expanded gradually. As a result of this, the output layer’s dimensions 
map from (1 x 1) to an aspect ratio comparable to that of a new large input. 
This can be perceived as trimming a large input image into squares of the 
models’ initial input size (64 = 64) and identifying the substances in each 
of those squares.!!!” 


9.2 LITERATURE STUDY 


We study few of the research in the topic of on-road vehicle detection and 
lane detection as follows: 

Sun et al. (2006) presented a survey of vision-based on-road vehicle 
detection systems which is an important component of a driver-assistance 
system. He put light on several prominent designed prototypes in the last 
15 years. They discussed Hypothesis Generation (HG) methods they are 
(1) knowledge-based, (2) stereo-based, and (3) motion-based. Edge-based 
methods, Hypothesis Verification (HV) methods, they are (1) template- 
based and (2) appearance-based along with critique of each methods. 
Moreover, effectiveness of optical sensors in detecting on-road vehicle 
is being discussed. Furthermore, vision-based vehicle detection methods 
with special references to the monocular and stereovision domains in the 
last decade have been discussed.'” In the later time, a concise review has 
been carried out on vehicle detection by classifying vehicle type classifica- 
tion by processing videos from traffic surveillance cameras." 

Song et al. (2019) propose a vision-based vehicle detection system 
which can be employed for counting vehicles in highway. This research 
proposes a segmentation approach to uncouple road surface from the image 
and classifying it into a remote area and a proximal area and subsequently 
identifying the dimension and location of the vehicle. Next, the Oriented 
FAST and Rotated BRIEF (ORB) algorithm is employed to locate the 
vehicle trajectories.“ 

An exhaustive study of the vehicle detection in dynamic conditions 
such that visual data are processed using a feature representation method 
known as object proposal methods has been presented by Sakhare et al. 
(2020).'° Inspired by the capability and usage of CNN in analyzing a huge 
image data,'° Leung et al. (2019) experimented vehicle detection in insuf- 
ficient and nighttime environment where the objects on photographs are 
blurry and darkened using deep learning techniques. 
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9.3. PROPOSED APPROACH 
9.3.1 DATASET 


The data for investigation are gathered from Udacity which provides a 
labeled data of 9000 images consisting of vehicles and other 9000 images 
where vehicles are not present considering all the images are of size (64 
x 64). The dataset is an instance of GTI Vehicle Image Database, KITTI 
Vision Benchmark Suite,'° and samples are extracted from the project 
video graphs. A sample of images from the dataset is shown in Figure 9.1. 


vehicle vehicle vehicle vehicle vehicle 
0 


40 


60 


20 


non-vehicle non-vehicle non-vehicle 
0 


FIGURE 9.1 An abstract view of data of vehicles and non-vehicles. 


The data are of 17,760 samples of colored image and image of resolu- 
tion of (64 x 64) pixels. The dataset has been partitioned into a training set 
consisting of 90% volume (15,984 samples) and validation set of 10% data 
(1776 samples) in order to realize a balanced division, which in turn would 
be a dominant factor later while training and testing the deep learning 
model and may causes bias toward a particular class. 


9.3.2 FLOWCHART 


Figure 9.2 shows the detailed procedure involved in vehicle detection 
while experimenting using CNN. 
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FIGURE 9.2 Flowchart of the CNN process. 


9.3.3, ARCHITECTURE 


The system and its underlying components are represented in Figure 9.3. 
The CNN model makes use of Rectified Linear Unit (RELU) activation 
functions in the convolution layers whereas in order to compute output at 
output layer, sigmoid function is being utilized. The use of RELU function 
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in hidden layer is attributed to learning which happens in hidden layers. 
RELU activation function is preferred considering vanishing gradient 
problem as it is linear for x>0 and 0 for all negative values. Tanh and 
sigmoid need not be used as activation function for hidden layers because 
of vanishing gradient problem. 


Speed 
decision 
Steering offset 


controller 


velocity 


Car Controller 


Visual 
processing 


FIGURE 9.3 High level system architecture. 


Detected lines 


The cross entropy loss function is applied since binary classification 
and sigmoid activation function are employed in the output layer. 


9.4 THE CONVOLUTION OPERATION 


Mathematical analysis expresses convolution as a function emanating from 
the integration of two given functions, such that the shape of one is trans- 
formed by superimposition of the other. This is represented in Figure 9.4. 

In principle, convolution operation comprises of three elements, they 
are: 


¢ Input image: 64 x 64 matrices 
¢ Kernel/filter/feature detector: 3 x 3 matrices 
* Feature map 


Feature map is the byproduct of the integration of input image with 
feature detector matrix. It is displayed in Figure 9.5. 
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Input Image Feature 
Detector 


FIGURE 9.4 Matrix representation. 


Input Image Feature Feature Map 
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Input Image Feature Feature Map 
Detector 


FIGURE 9.5 Construction of feature map. 
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The generated feature map is employed in the next step in order to trim 
the input image. It has been exhibited in Figure 9.6 as multiple feature 


maps form the convolutional layer. 


Feature Maps 


We create many 
feature maps to 
obtain our first 

convolution layer 


Input Image | | 


Convolutional Layer 


FIGURE 9.6 Representation of a convolutional layer. 


The feed of RELU is an additional step in the convolution operation. In 
order to address nonlinearity in input images, the rectifier linear function 


is employed (Fig. 9.7). 


y - 


(x) = max(x, 0) 
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Input Image | <= | 


Convolutional Layer 


FIGURE 9.7 RELU activation function. 


9.5 RESULTS AND DISCUSSION 


Detection of lane and vehicles from a video generated from a camera 
sensor located on a car in motion on a highway is feed as input. Hence, this 
video can be used in real time on a car to make the car intelligent agent. 
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9.5.1 VEHICLE DETECTION 


The dataset is split into the training set (90%, 15,984 samples) and validation 
set (10%, 1776 samples). 

A neural network is designed to be operated implementing a CNN 
with an objective to classify the images into car and non-car classes. The 
fully convolutional network parameters are represented in Table 9.1 which 
shows the structure of the CNN and its learning parameters. Here, “Conv” 
represents a convolution layer; all pooling operations are performed using 
Max_ pooling. The different levels of features of images in both convolution 
and pooling layer are extracted and it is revealed that 1,347,585 total number 
of parameters are elicited and trained in training phase. 


TABLE 9.1 CNN Parameter Details. 


Layer type Output size Parameter count 
Lambda_1(Lambda) (64, 64, 3) 0 

Conv1 (Conv2D) (64, 64, 128) 3584 
Dropout_1 (Dropout) (64, 64, 128) 0 

Conv2 (Conv2D) (64, 64, 128) 147,584 
Dropout_2 (Dropout) (64, 64, 128) 0 

Conv3 (Conv 2D) (64, 64, 128) 147,584 
Max_pooling2D_1 (Maxpooling2) (8, 8,128) 0 
Dropout_3 (Dropout) (8, 8,128) 0 

Densel (Conv2D) (1, 1,128) 1,048,704 
Dropout_4(Dropout) (1, 1,128) 0 

Dense2 (Conv2D) (1, 1, 1) 129 


After training for 20 epochs, the model can be employed for making a 
prediction on a random sample (Fig. 9.8). 

Additionally, the same network trained with our 64 x 64 images can be 
used to detect cars anywhere in the frame. They scale to whatever the input 
is, So now we have a heat map output. Consequently, abounding boxes can 
be drawn on the hot positions. 


9.5.2 LANE DETECTION 


In this section, we experiment detection of two lane lines on the road for 
each frame using computer vision techniques (Fig. 9.9). 
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NN Prediction: CAR with value 1.@ 
Ground-truth: CAR with value 1.0 
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FIGURE 9.8 Car positioning identification. 


FIGURE 9.9 Two lane detection. 
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9.5.3. PERFORMANCE MEASURE 


Test Accuracy and Loss: The following accuracy was obtained on 
performing classification on the testing data from our dataset (Table 9.2). 


TABLE 9.2 Performance Parameters. 


Epoch# Time (sec) Loss Accuracy Val_Loss Val_Accuracy 
1 54 0.0764 0.8940 0.0213 0.9778 
2 48 0.0194 0.9756 0.0142 0.9866 
3 48 0.0117 0.9855 0.0099 0.9897 
4 48 0.0075 0.9904 0.0107 0.9879 
5 48 0.0063 0.9923 0.0073 0.9926 


Model Accuracy vs Number of Epochs 
100 


97.8 


95.6 


Accuracy 
of the 
Model 


93.4 


91.2 


Number of Epochs 
FIGURE 9.10 Accuracy of the CNN model. 


Itis witnessed from Figure 9.10 that the accuracy of the model increases 
drastically after the 1st epoch; however, after the 2nd epoch the accuracy 
increases gradually. 
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Plot of Loss Function vs Number of Epochs 


Loss 


Number of Epochs 


FIGURE 9.11 Loss graph of the model. 


In Figure 9.11 above, the value of the loss decreases drastically after 
the Ist epoch and then decreases gradually after the other 4th epoch. 


9.6 CONCLUSION 


Most of the literature discussed uses traditional image processing tech- 
nique for detecting vehicles on the road and also detecting driving lanes 
and road space from the captured image. However, we employ CNN, a 
novel technique for detecting lanes and vehicle. The superiority of CNN 
for the image classification task has been realized, and it delivers the 
result that was attainable never before. The diversity of current model 
can be enhanced by training the NN with contrasting and nonidentical 
images. 
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ABSTRACT 


An automatic vehicle damage detection platform can enhance the 
customer claiming process and reduce the unnecessary cost of repair 
for an insurance company. Typically, the claim estimation process is 
manual which requires human experts to evaluate the damage cost. This 
is error-prone, time-consuming, and requires man-hour workers. In this 
chapter, a damaged vehicle part detection platform, called Intelligent 
Vehicle Accident Analysis (IVAA) which provides artificial intelligence 
as a service (AlaaS), is proposed. The system helps automatically assess 
vehicle parts’ damage and severity level. An insurance company can 
utilize our service to speed up the claiming process. IVAA is built on 
the docker image which allows the system to be scaled depending on 
the workload efficiently. Capsule neural network (CapsNet) is applied 
for damage recognition including two phrases: damage localization 
and damage classification. The accuracy of the damage localization 
is 93.28% and the accuracy of the damage classification is 98.47%, 
respectively. 
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10.1 INTRODUCTION 


The major role of the auto insurance companies is to provide services to 
their customers supporting the claiming process. Providing the fast services 
in the field and fast damage repair evaluation is the key success to satisfy 
their customers. The conventional claiming process usually takes an hour 
to a day for a customer when the accident happens. For example, he/she 
has to wait for the arrival of the field personnel, and repair quotation from 
the insurance experts at the company. The field personnel must spend time 
to inspect the vehicle at an accident site in the traditional claim process. 
Figure 10.1 shows the conventional claiming flow. It starts with an 
appraisal where either the insurance company will send someone out to 
the customer car to evaluate the damage, or the customer brings the car 
to the company or the registered body shop, the car damage is inspected, 
the fixing process finished, and the reimbursement is done. The whole 
processing time can be reduced and the customer satisfaction can be 
increased with the help of artificial intelligence (AI) technology platform. 


(1) (2) 
Intimate the vehicle insurance Field employee will inspect 
company online or call. the vehicle. 


(4) (6) (6) 
Make payment to the Submit bills along with payment The company make 
workshop directly. receipts to field employee. reimbursement payment. 


FIGURE 10.1 = Traditional clamming process. 


G) 
Provide requested document 
to field employee. 


There are several core areas in AI such as knowledge, reasoning, 
problem solving, perception, learning, and ability to manipulate with 
objects. Deep learning technique is an effective methodology to build 
an intelligent agent.'! The area is quite mature in recognition tasks.* We 
apply it to detect damaged parts and damage levels on the vehicle from 
the accident. The integration of such intelligence into the company service 
can decrease the claiming process turnaround time and increase the work 
effectiveness.”!”° 

This chapter focuses on the use of AI in the auto insurance company. 
The software architecture along with services where the company can 
utilize on top of its claim process is designed. The prototype application 
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demonstrates how the claim processes can be automated, serving all stake- 
holders: field worker, car owner, body shop partner to speed up the service 
anytime and anywhere. 

The outline of the chapter is as follows. Next section presents the back- 
grounds including the literature reviews of car damage evaluation systems 
and object detection methods. Then, the overall system, the description of 
each element, and software architecture are presented. The implementation 
of each system element is then described. Finally, the evaluation process 
as well as the conclusion remarks are presented. 


10.2 BACKGROUND 


The research on AI has greatly improved the effectiveness of both manu- 
facturing and service industries. Recent commercial applications to recog- 
nize the vehicle accident damage with AI utilized the IBM Watson.’ Figure 
10.2(a) shows an example user interface from IBM Watson. It presents the 
possible car damage and the types. 


GI Vehicle Damage Analyzer 


Choose File flatTireTest.jpg Upload File 


Watson sees... 
Class Confidence Score 
BrokenWindshield 0.002 
FlatTire 0.906 
MotorcycleAccident 0.000 
(a) IBM Watson. (b) Car damage detective. 


FIGURE 10.2 IBM image recognition software. 


In this work, the proposed platform called Intelligent Vehicle Accident 
Analysis (IVAA) System utilizes images as input data in the same manner 
as IBM Watson Visual Recognition, and deep learning techniques for 
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recognition. Our work is built upon the integration with open source 
software, supports multiple image processing at the time and provides a 
user-friendly and price estimation. 

Figure 10.2(b) presents the car damage detective software which is 
an open source on Github by Neokt.* Compared to ours, I[VAA is used to 
detect the specific vehicle part via images, support multiple images of the 
vehicle, and provide a price estimation on a mobile application. 

IBM Watson is a system based on cognitive computing as shown in 
Figure 10.3. It contains three elements: Watson Visual recognition, web 
server, and mobile application. 


Mobile 
Application 


3) 
Eo 
o + 
BS 

-_ 
%S 


FIGURE 10.3 IBM Watson architecture. 


Table 10.1 compares the three softwares in many aspects. The required 
features of the software are such as classification, localization, automatic 
model training, and cloud support. 

In Ref [3], IBM Watson has on its own recognition engine while ours 
and car detective are based on Tensorflow and Keras. Compared to these, 
we can detect more vehicle parts and more damage levels. To achieve an 
accurate estimation, the model should be able to infer the type of damage. 
As it affects the expense, it is necessary for the service to suggest the 
repair or replace the damaged part. 

The template matching method is a naive approach for finding a similar 
pattern in the image.° The extension is gray scale-based matching and 
edge-based matching outlines.° The gray scale-based matching is able to 
reduce the computation time, resulting in up to 400 times faster than the 
base-line method while edge-based matching performs the matching only 
on an edge of an object.’* The output is a gray scale image by each pixel 
representing the degree of matching. 
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TABLE 10.1 Software’s Comparison. 
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IVAA IBM Watson- Car damage 
based [3] detective [4] 
Features 
Classification Yes Yes Yes 
Localization Yes No No 
Deep learning library Tensorflow IBM Watson Keras 
Store result Central server No No 
Deployment Private cloud IBM cloud Private cloud 
Labeling system Yes No No 
Model training and tuning API call IBM Watson Manual 
Interface 
Web application Yes Yes Yes 
Mobile platform via LINE Native iOS and No 
Android 

Visualized result 
Parts Yes Yes Yes 
Accuracy confident Yes Yes No 
Estimate repair price Yes No No 
Detection system 
Type of detection 23 parts 3 zones 4 types 
Damaged levels 5 levels 3 levels 1 level 


Using convolutional neural network (CNN) is another approach for 
image recognition. It can be used to recognize the category of the image. 
It can be adopted to perform object detection and localization. Several 
existing networks are such as the following. 

Faster region-based convolutional neural network (R-CNN) is devel- 
oped based on and provides a user-friendly and price estimation and 
R-CNNs.’ The object detection process is separated into two stages.!° In 
the first stage, R-CNN applies the selective search to generate the proposed 
regions. For the second stage, it applies the image classification model to 
extract features from the proposed region of the previous stage, and then 
feeds those features to Support Vector Machine (SVM) for generating the 
final predictions.'"” 

The improved version of R-CNN is able to provide the faster and more 
accurate results. The main modifications of Faster R-CNN are to use 
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CNN to generate the object proposals rather than using selective search 
in the first stage.'> This layer is called region proposal network (RPN). 
RPN uses the base network to extract feature map more precisely from 
the image. Then, it separates the feature maps to the multiple squared 
tiles and slides on a small network across each tile continuously. The 
small network feeds a set of object confidence scores and bounding box 
coordinates to each location of tile.'* RPN is designed to be trained in 
an end-to-end manner. Using Faster R-CNN can reduce the training and 
detection time.!*!° 

Recently, capsule neural network (CapsNet) has shown a better 
accuracy than the typical CNN. A capsule is a group of neurons whose 
activity vector represents the instantiation parameters of a specific type 
of entity such as an object or an object part.'’ CapsNet contains capsules 
rather than neurons. The group of capsules learns to detect an object 
within a given region of the image, and gives the outputs vector which 
represents the estimated probability that the object is present and whose 
orientation encodes the object’s pose parameters.'® The capsules are 
equivariant to the object pose, orientation, and size. 

The architecture contains an encoder and a decoder as shown in 
Figure 10.4. The encoder is used to take the input data and convert it 
into the n-dimensional vector. The weights of the lower-level capsule 
(PrimaryCaps) must align with the weights of the higher-level capsule 
(DigitCaps). At the end of the encoder, an n-dimensional vector is passed 
to the decoder. The decoder contains many fully connected layers. The 
main job of the decoder is used to take the n-dimensional vector and 
attempt to reconstruct from scratch which makes the network more robust 
by generating predictions based on its own weights. 


Hf 256 y ae 16 
ReLU Conv1 a, DigitCaps INJ 
Input y Se . : 
Data we t 
Fi / 32 10 
f pee H 
Primary Caps ” i] 
Vi 4 
“xi 


Encoder Decoder | 


FIGURE 10.4 The architecture of capsule neural network (CapNet) model."” 
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Four main computation stages in a capsule neural are: (1) matrix 
multiplication, (2) scalar weighting, (3) dynamic routing, and (4) vector- 
to-vector nonlinearity. 

First, the model performs a weight matrix multiplication between 
the information passed from the higher to the lower layer to encode the 
information of understanding spatial relationships. Next, the capsules from 
the lower level adjust its weights according to the weights of the higher 
level. Dynamic routing algorithm allows the passing data between layers 
in the network effectively, which increases the time and space complexity. 
The last step is to compress the information where the condense information 
can be reused. 


10.3. SYSTEM OVERVIEW AND ELEMENTS 


There are four user roles in the IVAA system: insurance experts, data 
scientists, operators, and field employees as shown in Figure 10.5. The four 
tools are developed for these four users: data labeling tool for insurance 
experts, deep learning APIs for data scientists, web monitoring application 
for operators, and LINE chatbot to interact with the back-end server for 
field employees as in product layers in Figure 10.5. 


@ e 
an LU a IK¢ 
Ee 


Data Labeling Tools Web Monitor Application LINE Chatbot 


FIGURE 10.5 System architecture. 


10.3.1 DATA LABELING TOOLS 


The labeling task is one of the time-consuming tasks before the training 
model process can start. The traditional labeling software such as LabelImg 
and Imglab’’ works as a standalone application which makes it hard to 
handle large number of data annotations. Figure 10.6 shows the flows 
of our tool which has a web interface where the user can collaboratively 
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work on the labeling task. The labeling tool returns a downloadable JSON 
file for the user for future use. VueJS is used as a frontend framework 
and REST API server. The labeling tool is also useful for adding more 
damaged labeled images for future retraining. 


at: Insurance Expert 


{ ‘Part’ : ‘door-back’, 
‘Side’ : ‘left’, 
‘Level’ : ‘medium’ } 


FIGURE 10.6 Data labeling tools sequence diagram. 


10.3.2. DEEP LEARNING APIS 


APIs are gateways which are designed for data scientists and developers to 
train and deploy the model. Figure 10.7(a) presents the deep learning API 
used to input new data and model hyper-parameter for training to create the 
new deep learning model. The API returns the model identification (model 
ID) to the user as a link for the model deployment. Figure 10.7(b) shows 
the testing API which inputs the testing data and model ID to deploy the 
model. It returns with the list of damaged parts and levels on the vehicle 
along with the accuracy. 


10.3.3, WEB MONITORING APPLICATION 


The operators monitor the cases using the web monitoring application. It 
shows the historical data that contains the number of cases, the number 
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of processed images, and the number of days that system operated. The 
visualization displays in the heat map style, showing the frequency of 
accidents by locations and calendar days. Figure 10.8 shows an example 
of tasks that an office operator monitors with an overview of the case and 
location. 


e e 
LS Deep Learning API (e+ ee Deep Learning API 
S TT Data Scientist I Data Scientist 


Model configuration file Model identification file 


{ ‘Photo’ : ‘test photo’, 
‘model-id’ : ‘model identification’, 
‘project-id’ : ‘project identification’ } 


{ ‘Photo’ : ‘*photos’, 
‘Label’ : **labels’, 
“Hyperparameter’ : **hyperparameter’ } 


/train 


{ ‘damaged-parts’:[.. . ], 
‘download_duration’ : ‘download time’, 
‘processing duration’ ; ‘processed time’ 
‘accuracy’: ‘...%"} 


{ ‘message : ‘successful’, 
‘model-id’ : ‘model identification’, 
*project-id’ : ‘project identification’ } 


return( result ) 


(a) Model training API sequence diagram. (b) Model testing API sequence diagram. 
FIGURE 10.7 Deep learning APIs sequence diagram. 


submit( caseInformation, *photos ) 


return( visualization ) 


SRST SSCS EERE EES ESS SEE ESTES EEE EEE SESE SESE ESE ESSE SEES SESE SES 


FIGURE 10.8 Web monitoring application sequence diagram. 
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10.3.4 LINE OFFICIAL INTEGRATION 


Field employees use the LINE chatbot service specifically designed for 
insurance field employees. The chatbot takes the damaged car images via 
LINE chat and gives the resulting car model and price table images, along 
with the list of body shop details and locations. 

In Figure 10.9, the field employee sends the detail of an accident case, 
customer ID, shares the accident location, and uploads the damaged car 
images. Next, the deep learning testing API is executed to recognize the 
damaged parts and classify the damage level from the submitted photos. 
The chatbot stores the communication dialogues to the main database. 


® 
ae LINE Chatbot 
eS Field Employee 


startService( ), input( CustomerID ) 


shareLocation( ) 


FIGURE 10.9 LINE official integration sequence diagram. 


10.3.5 SYSTEM SOFTWARE ARCHITECTURE 


All the above services are deployed on the private cloud system with 
hardware specification listed in Table 10.2. We use the private server to 
train the model, serving the model, and hosting a website. 
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TABLE 10.2 Hardware Specification. 


Hardware Specification 

CPU Intel (R) Core (TM) i5-2400 CPU @ 3.10 GHz 
GPU NVIDIA Tesla K40c 

RAM 24 GB 

HDD 4TB 

Internet connection speed 100 Mbps 


Figure 10.10 displays the software stack of the system. OpenStack 
is used for computational resource management such as memory, CPU, 
network, and other resources to provide for each specified tasks on 
the container. Visualized resource monitoring part is divided into two 
sections. The first parts utilize Grafana to monitor resources on the private 
cloud via OpenStack and the second section utilizes Sahara which moni- 
tors resources on the private cloud directly. Kubernetes provides scaling 
computational resources for each docker container. Docker engine is 
used visualization container resources to define for task management 
contribution which is IVAA Core. MongoDB is the main database since 
it can be scaled out to support more data size and it has a support for 
unstructured data. 


[ IVAA Core ] 


MongoDB Docker Engine 


.G) 


Kubernetes Grafana 
OpenStack 


Sahara 


Private Cloud 


FIGURE 10.10 System software stack. 
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10.4 IMPLEMENTATION 


The implementation is divided into four parts based on four components 
of IVAA: (1) data labeling tool (2) deep learning APIs (3) web monitoring 
application, and (4) LINE chatbot. 

Figure 10.11 demonstrates the interface of data labeling tools. An 
insurance expert uploads the damaged vehicle to the IVAA system. The 
user selects the part and label the photo of a damaged vehicle. The tool 
makes the labeling process easier than manual labeling. Users can also 
download the labeled and unlabeled photo data form IVAA system to the 
local machine. 


(c) Label damaged part (d) Label damaged level 


FIGURE 10.11 Data labeling tool. 


The labels used for building the models come from multiple insurance 
experts and the experts may have different subjective opinions on how 
some of the cases should be labeled. We have studied this scenario by 
designing a multi-expert learning framework that assumes the information 
on who labeled the case is available. The framework explicitly models 
different sources of disagreements and lets us naturally combine labels 
from different human experts to obtain a consensus classification model 
representing the model groups of experts converging to and individual 
expert models. 
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Labeling data or data annotation is important in deep learning. It provides 
that initial setup data for the machine learning task. The mislabeled data 
can lead to wrong prediction easily. Labeling tools are precious and the 
good ones are usually costly. 

In IVAA network, CapsNet is used to recognize the damaged vehicle 
parts and the levels of severity from the vehicle’s photos. We keep each 
trained model’s replica on a single GPU. The memory contains the large 
number of weights for the layers. In addition, omitting the batch-normal- 
ization on top of those layers, we are able to increase the overall number of 
inception blocks considerably. Table 10.3 shows example damage Levels 
that are identified from our system. 


TABLE 10.3 Representing the Damaged Level. 


Damaged level Colors 
No damaged White 
Low level damaged Yellow 
Medium level damaged Orange 
High level damaged Red 
Replacing damaged Gray 


In Figure 10.12, we use IVAA network to recognize many photos 
obtained from many perspectives. The parts are mapped to images below 
where the filled color shows the damaged level for each damaged part. 
Level of damaged parts is based on Thai General Insurance Association 
(TGIA), which is an organization that promotes and supports the nonlife 
insurance industry as an accident insurance. 

We provide two main APIs for training and testing the model for the data 
scientists of an insurance company as shown Figure 10.13. The training 
API requires new images and model configuration as inputs to train and 
create the new model. It returns the model ID to user for future model 
usage. Testing API takes testing data and model ID as inputs for testing 
model. It returns the list of damaged parts and damaged levels on the 
vehicle image. The APIs conform the REST architectural style or RESTful 
web services, providing interoperability between computer systems on the 
Internet. REST-compliant web services allow the requesting systems to 
access and manipulate textual representations of web resources by using a 
uniform and predefined set of stateless operations. 
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FIGURE 10.12 IVAA network recognizing the photos. 


Deep learning APIs are gateways for the user to deploy our system. 
This enables adding the new data sets and retraining the deep learning 
model effectively. Incremental retraining allows the increments of model 
accuracy when having limited computing power. 

Figure 10.14 presents the web application developed using VueJS 
framework with Bulma CSS framework. Web monitoring application 
is targeted for an office worker, a system administrator and a business 
manager. An application has six main pages for monitoring and interacting 
with the system. 

The Login page on our web monitoring application is shown in 
Figure 10.14(a). The Authentication Required function in Go program- 
ming language library is adopted. The security in the front end is one way 
to limit the user interference. However, some users require more flexibility 
than others and there are always trade-offs. 

Figure 10.14(b) shows the dashboard page on our system. It contains 
three elements: (1) the cases (2) the images processed, and (3) how long 
systems operated. The first element is the important one where it presents 
cases reported as well as case management. The second element is about 
images and their processes. The third element is the system administrative 
information. Dashboard is a data visualization tool that allows all users 
to analyze issues to their system. It provides an objective view of perfor- 
mance metrics and serves as an effective foundation for further dialogue. 
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(b) Testing model API 


FIGURE 10.13 Deep learning APIs. 


Figure 10.14(c) shows the heat map of the cases reported. The primary 
purpose of heat maps is to visualize the volume events by locations within 
the data sets and assist in directing viewers toward areas. Fading color 
shows the density of accident case in that location. 

An accident case can be inserted via the case insertion page as shown 
in Figure 10.14(d) For each accident case, the case identification number, 
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the customer identification number, accident location are required. The 
images of the damaged vehicle can be uploaded. Drag and drop zone is 
provided for uploading the photo with convenience. The submitted case 
is reviewed for approval and the case’s information is shown in Figure 
10.14(e). In the figure, the case information page is the damaged level 
on 4 according to the images of the vehicle. Moreover, it indicates the 
information on the repaired cost based on the damaged level. 

The user can find all historical accident cases from the case finder page 
as shown in Figure 10.14(f). The search can be done by the case identifica- 
tion number, customer identification number, and accident date. The detail 
button is used to show the detailed information. 

Figure 10.15 shows LINE chatbot interface. The LINE official account, 
namely IVAA as shown in Figure 10.15(a), is supposed for the auto insur- 
ance company claiming process. The LINE messaging APIs allows the 
data to be passed between the server of chatbot application and the LINE 
platform. When a user sends the chatbot a message, a webhook is triggered 


(a) Login page (b) Dashboard 
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FIGURE 10.14 Web monitoring application. 
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and the LINE Platform sends a request to the webhook URL. The server 
sends a request to the LINE platform to respond to the user. 

The requests are sent over HTTPS in JSON format. The users can post 
the IVAA web page onto their Line timelines to make it visible all their 
friends. The LINE platform allows the user (field employee or customer) 
to send the damaged car images to the company LINE official account to 
get the price and damaged results. 

After adding IVAA as a friend, the user starts using the service as in 
Figure 10.15(b). The system requests the customer identification number 
for authentication. Figure 10.15(c) presents our system authentication to 
use our service. The service can generate the unique case identification 
number to the user. The unique case identification number is used for 
tracking the service progress. 

The service also requires the user to share the place of an accident 
location as shown in Figure 10.15(d) sharing an accident location allows 
the field worker heading to the location. 

Figure 10.15(e) shows our uploading the damaged vehicle’s photo in 
the accident process. At the start, the user takes the photos of the damaged 
vehicle includes the font side view, the back-side view, the left-side view, 
and the right-side view. After that, the system acknowledges the receipt of 
photos. Then, our service returns the analysis result from the deep learning 
model. The user can visualize the damaged level on the vehicle parts using 
the difference color as in Figure 10.15(f). In addition, our service can esti- 
mate the repair price with the breakdown level damaged parts of vehicle. 


10.5 EVALUATION 


The evaluation of the application is broken down into three parts. The first 
part evaluates the IVAA deep learning models. Secondly, the user satisfaction 
toward web application and LINE chatbot is assessed. Finally, the comparison 
of the our platform service against the pubic cloud platform is presented. 

IVAA deep learning model is compared against the template matching 
approach and other object detection on the selected car damage data set. 
Template matching is a technique in digital image processing for finding 
small parts of an image which matches a template image. The typical 
object detection algorithm such as R-CNN is used. 

IVAA deep learning model utilizes CapsNet to enhance our deep 
learning model. Due to its recent outstanding performance, we applied 
CapsNet to detect the damaged vehicle object from the photos, and then 
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FIGURE 10.16 The architecture of IVAA CapsNet model. 
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recognize the damaged vehicle parts and the levels of severity. However, 
since the focus of the work is the application of the model toward the auto 
insurance claiming process, alternative object detection model is possible. 

The architecture of CapNets is shown in Figure 10.16. From the bounding 
box, the damage part, CapsNet classifies the damage into the mentioned 
five levels. The part of car is highlighted according to the damage level. 

Toyota Camry image set available on https://gitlab.com/Intelligent- 
Vehicle-Accident-Analysis is used for evaluation. The data set includes 
1624 images and we divide 80% training and 20% testing. IVAA utilizing 
CapsNet yields the accuracy up to 97.21% as shown in Figure 10.17. It has 
greater accuracy than that of the template matching approach (93.58%). 
The object detection approach of traditional computer vision technique 
explores multiple paths where the algorithm is simplified but yet it can 
achieve higher accuracy with less computation cost (91.53%). 

To deploy the model for LINE ChatBot use, we set the threshold for 
bounding box detection and severe classification to 97.21%. Intersection 
over under (IoU) for our proposed system is 89.53%. The average inference 
time per image is 13.12 s on our private cloud. Figure 10.18 implies the 
inference time when increasing the number of images to 20 images. 


Accuracy (%) 


IVAA network Template matching Object dectection 
Techniques 


FIGURE 10.17 Accuracy on Toyota Camry data set. 
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FIGURE 10.18 The inference time per the images. 


The confusion matrix is shown in Figure 10.19. Our model can 
detect the damaged vehicle part very accurately. The data set and the 
comparison code of the tested car are available at https://gitlab.com/ 
Intelligent- Vehicle-Accident-Analysis. 
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FIGURE 10.19 The confusion matrix of IVAA network. 
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10.5.1 USER SATISFACTION 


The user satisfaction of the application is measured in two aspects: 
application usage and intelligence module. For the application aspect, the 
questionnaire asks in the aspect of usability, reliability, security, interface, 
and availability.°° The user satisfaction score is shown in Table 10.4. 


TABLE 10.4 Usability Test of Application Module (5-Highest). 


Aspect Score 
Usability 4.93 
Reliability 4.76 
Security 4.56 
Interface 4.66 
Availability 4.56 
Average 4.69 


For the intelligence module, Table 10.5 shows the summarized score. 
There are 6 criteria: prediction speed, accuracy, expectation satisfiability, 
input format satisfiability, and output format satisfiability. 


TABLE 10.5 Usability Test of Intelligence Module (5-Highest). 


Aspect Score 
Prediction’s speed 4.76 
Prediction’s accuracy 4.56 
Prediction’s expectation 4.60 
Input data format 4.53 
Output data format 4.83 
Average 4.66 


The general opinions from 30 users are collected. The average score 
for each aspect is shown. The average overall score is 4.69/5 for applica- 
tion side and 4.66/5 for intelligence module. There are 93.3% of users are 
highly recommend to their friends or companies. Moreover, experience of 
users expects to use our system in the real situation. 
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Tables 10.6 and Table 10.7 compare our platform against public services 
and general web development. IVAA targets at specific task, car damage 
detection, rather than general vision task. Our service solution using LINE 
is ready to use and the development process is not complex compared to 
using WebApp and NativeApp. 


TABLE 10.6 Model Platform Comparison. 


Feature IVAA Google AutoML Amazon 
Vision Rekognition 

Task Specific task General task General task 

Cloud Private cloud Public cloud Public cloud 

Custom data Yes Yes No 

Custom model Yes Yes No 

Car damage detection Yes No No 


TABLE 10.7 Development Platform Comparison. 


Feature IVAA NativeApp WebApp 
Home screen real estate Low High Low 
Time to market Fast Slow Middle 
Accessibility LINE Application Browser 
Security High Manual Manual 


10.6 CONCLUSION 


IVAA System is one of an artificial intelligence as a service (AlaaS) for 
an auto-insurance company. The system consists of four modules for four 
stakeholders: data labeling (for insurance experts), deep learning API for 
data scientists, the web monitoring application for the operators, and LINE 
official integration for field employees. We evaluate the system in two 
aspects: the damage detection capability and the application usability. The 
accuracy results demonstrate that our object detection model can predict 
the damage part and damage level correctly up to 97.21% while testing on 
the Toyota Camry data set. The average image inference time per image 
is 13.12 seconds. The users are satisfied our system. The average score 
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of user satisfaction is 4.69/5 for application usage and 4.66/5 intelligence 
module. 

Future work includes integrating our system with the driver side 
application to track the driver location and integrate driving informa- 
tion. The whole process of retraining when adding more images can be 
automated by periodic schedules. The database of body shops is added to 
the backend. 
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ABSTRACT 


Medical image security becomes more and more important. Full image 
encryption is not necessary in the field of medical because partial amount 
of encryption is enough to provide the security. Here Proposed is a partial 
image encryption of medical images, which uses different permutation 
techniques. Proposed technique mainly consists of permutation and 
diffusion process. Original medical image divided into nonoverlapping 
blocks with the help of block size table. Then position of each pixel in 
every blocks are shuffled according to chaotic sequence generated from 
the chaotic map system and predefined block size table. In the diffusion 
process, based on basic intensity image (BIJ) and different permutation 
technique, the mapping operation apply to get partially encrypted medical 
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images. Experiential results show that proposed method provide more 
security with less complexity and computational time. 


11.1 INTRODUCTION 


In the last two decades, owing to rapid progress in communication systems 
and multimedia technology, digital image encryption has played an impor- 
tant role in secure communication applications in the fields of military, 
medical, satellite, and so on. Partial encryption is one of the most commonly 
used encryption technique in the medical field where it is not necessary to 
encrypt the full image. Because small amount of encryption leads to high 
security and less computational in terms of time and complexity. In partial 
image encryption, generally it gives some clue about original image.*?!°"' 

Guodong et al.'* explained about auto-blocking technique for 
segmenting the original image with predefined block size and ECG signal 
used as key for generating random sequence using chaotic system. Based 
on different ECG signals, different key is obtained for encrypting the 
different images. Lu and Gou et al.'’ explained about the segmentation of 
image done by using different block size for permutation process and for 
diffusion process implemented with the help of dynamic index technique. 
Permutation of image is done by either horizontal or vertical cross section 
of the original image. Zhongyunhua et al.° introduced a two-dimensional 
sine logistic map based image encryption algorithm. It has a lot of advan- 
tages as compared to chaotic map like greater ergodicity, hyper chaotic 
property, and low of the implementation is very low. Panduranga et al.’ 
describes the partial image encryption scheme for controlling the amount 
of encryption with a different step size. In this method, multistage hill 
cipher technique is used for manipulating the pixels value in the original 
image and division of blocks are varied to control the amount of encryp- 
tion. This method can be used in the smart cameras where the specific 
amount of encryption is required. Kumar et al.* proposed block-wise 
approach for encrypting image partially. Where different combination of 
block size produce various encrypted partial images. To shuffle the pixels 
within group, chaotic system has been incorporated.”* *° 

Xiangyun et al.'* introduced the concept of color image encryption in 
spatial and frequency domain which includes discrete wavelet transform 
(DWT) and six-dimensional hyper chaos. In spatial domain, key sequence 
generated by hyper chaotic and segmentation of original image is done with 
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the help of DWT which leads to four frequency band of original image in 
frequency domain. Belaze et al.* explained about most common shuffling- 
diffusion process based on an image encryption system where the diffusion 
of the image occurred first then followed by chaos-based shuffling process. 
Xiang et al. describe the medical image full and selective image encryp- 
tion. This technique consists of several stages where every stage consists 
of permutation phase and diffusion phase. Block-based concept is used to 
permute and encrypt with the help of chaotic map.'* Parameshachari et al. 
(2013) proposed partial encryption for medical images which uses the DNA 
encoding and addition techniques. Random image is generated from chaotic 
map which undergo DNA addition with original image to get different partial 
encrypted images.'° Bhatnagar and Wu explain the concept of SVD and pixel 
of interest to encrypt selectively the group of pixels in the input image. The 
idea of this method is to use saw tooth space fiand Q curve to shuffle the pixel 
positions and diffusion can be done with the help of nonlinear chaotic map.? 
Mahmood and Dony explained algorithm which divides the medical image 
into two parts based on amount of significant and nonsignificant information 
namely the region of interest (ROI) and the region of background (ROB). 
To reduce the encryption time, AES applied to ROI and Gold code (GC) 
to ROB.° Parameshachari et al. introduced the partial encryption of color 
RGB image. In this method, input color image is segmented into number of 
macroblocks. Based on the interest, few significant blocks are selected and 
encrypted using chaotic map.’ Chowdhary et al.'? explained about different 
fuzzy segmentation methods used for dividing and detecting brain tumors 
in the medical MRI images. Chowdhary et al.”° introduced a hybrid scheme 
for breast cancer detection using intuitionistic fuzzy rough set technique. 
The hybrid scheme starts with image segmentation using intuitionistic fuzzy 
set to extract the zone of interest and then to enhance the edges surrounding 
it. Chowdhary?! explained about how clustering approach holds the posi- 
tive points of possibilistic fuzzy c-mean that will overcome the coincident 
cluster problem, reduce the noise, and bring less sensitivity to an outlier. 
Chowdhary et al.” explained experimental assessment of beam search 
algorithm for improvement in image caption generation. 

The entire chapter is divided into various sections, where Section 2 
explains about various permutation methods used in the proposed system. 
Section 3 gives detailed description of proposed partial encryption system 
based on various permutation techniques. Performance metric analysis of 
the proposed system is explained in Section 4. At last conclusion of the 
chapter is described in Section 6. 
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11.2 PERMUTATION TECHNIQUES 


Generally any image encryption algorithm involves the permutation and 
diffusion process. In the permutation, where the position of the pixels 
changes there are different techniques used for permuting the image, 
which includes chaotic map continuous chaos (CC), Gray code (GC), 
Sudoku code (SC), and Arnold cat map (AC). Detailed description of 
every permutation method is explained below. 


11.2.1 CHAOTIC MAP 


Selecting the map for any image encryption scheme is very important 
and also the major step. Here, chaotic map has been used for permutation 
process because of its tremendous features like periodic windows, chaotic 
interval, complexity, sensitivity to initial condition, uses of chaotic system 
in encryption system more secure and less complex. Chaotic map fulfills 
the requirement of encryption system in terms of privacy and efficiency.'° 

Mathematical chaotic map can be defined by using following eq 11.1 
which includes two parameters that is r and x0 that will be considered as 
key for the encryption. 


Xu) =r*X, (1-X,) (11.1) 


n 


where the range of initial parameter x lies between 0 and 1. The range of 7 
lies between 3.57 and 4. 


11.2.2. CONTINUOUS CHAOS (CC) 


Another system used for generating the random sequence is the continuous 
chaotic system which can be defined by Lorenz system” as shown in eq 11.2. 


as -10 10 O |x 0 
yl=| 8 4 O fl y]+}—xz}. (11.2) 
Zz 0 O = Zz xy 

To remove the near predictability of above CC system by adjusting the 


output sequences x, y, z. Later long sequence can be obtained by combining 
every values of all the three sequences. This sequence is arranged in the 


Partial Image Encryption of Medical Images Based 227 


nondecreasing order and store the new index values for the shuffling 
process. 


11.2.3. GRAY CODE (GC) 


GC technique?! is simple and a more effective permutation method, which 
is defined in eq 11.3. 


G=B@(B>(qt1)) (11.3) 


where B indicates the k-bit number, G is the k-bit GC value, @ is the 
binary exclusive OR (XOR) operation, g is an integer, and > is the binary 
right shift. The GC for a k-bit number is a also a k-bit number. 

To shuffling the original image using GC, firstly image has to be 
converted into a row array of pixels. Let us consider an example where GC 
uses four numbers P1, P2, offl, off 2. It should be mentioned that offl and 
off 2 are k-bits numbers. For each pixel location, two GC values X1 and 
X2 are calculated, where X1 = Gray(A, P 1) © off 1 and X2 = Gray(A, p2) 
® off 2. Then, read the pixel at location X1 and place it in location X2 in 
the permuted image. 


11.2.4 SUDOKU CODE (SC) 


One of the most commonly used permutation method is sudoku where 
every row contains same number of elements but in a different order. 
Similarly, column also contains same number of elements but in a different 
order. The name “Sudoku Code” was inspired by mathematical papers by 
Leonhard Euler.” 


Algorithm for SC is described below: 
Algorithm: Sudoku Code Generation S=Sudoku (p1,p2) 


Require: p1 and p2 are two length-Q sequence 
Ensure: Pisasudoku Code of order K 
1.Kseed=sorting(p1) 

2.Kshift=sorting(p2) 

3.for i=0 to N-1 do 

4. do<=S(i,:)=circularshift(Kseed, Kshift(i)) 
5: end for 
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11.2.5 ARNOLD CAT MAP (AC) 


AC is one of the important random shuffling method”® which is defined 
by following eq 11.4. It consists of p and g positive integer and can be 


considered as key. 
s‘} | 77) s 
Bae eal me 


where (s, 7) and (s, 7) are the picture coordinates of the input and permuted 
image, respectively. Figure 11.1 shows how various permuted images are 
obtained by using above permutation techniques. 


Original Matrix 


(a) Chaotic Map (b) Continuos Chaos (c) Suduku (d) Arnold cat Map 


FIGURE 11.1 Permuted matrix using various permuted techniques. 


11.3. PROPOSED PARTIAL IMAGE ENCRYPTION (PIE) METHOD 


Architecture of proposed partial encryption scheme using various permuta- 
tion techniques is shown in Figure 11.2 where it consists of permutation 
stage followed by mapping stage. At fi3.W medical input image whose size 
should in-terms of powers of 2 segmented into nonoverlapping macroblock 
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and size of macroblock has been defined in Table 11.1. By using one of 
the abovementioned permutation technique especially chaotic map used for 
changing the pixels within every segmented block to get the various inter- 
mediate permuted images. With the use of basic intensity image (BII) where 
it contains all the pixels ranging from 0 to 255 in mapping process along 
with one of the permutation method to get the various partial encrypted 
images. The detailed description of block-wise permutation and mapping 
process can be explained as mentioned below. Permutation process: 


Original Block Block-wise Mapping Partial Encrypted 
Medical Decomposition Permutation 


; Stage Images 
image 


Block Size | | chaotic Basic Permutation 
Table Map Intensity Technique 
image (BII) 


FIGURE 11.2 Architecture of partial image encryption system for medical images. 


The steps explain about how input image is permuted by using chaotic 
map system. 


Step 1: Input plain medical image having a size M « N. 

Step 2: Partitioning the plain medical image into nonoverlapping 
macroblocks according to predefined block size from block 
size Table 11.1. 


TABLE 11.1 Block Size List. 


Sl. no Block size 
1 4x4 

2 8x8 

3 16 x 16 

4 32 x 32 


N (n/2) x (n/2) 
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Step 3: 


Step 4: 
Step 5: 


Step 6: 
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Generate the random sequence with the help of chaotic system 
eq 11.1 along the initial key x0 and r. Chaotic sequence X can 
be represented as: 


X= 9615 K22:03y wor eo dbiiswonaatane xn—1 


Arrange the above chaotic sequence X in the increasing order 
and store the newly obtained index values. 

With respect to new position values, randomly permute the 
position of gray values in every block. 

To get the randomly permuted image by merging all the blocks 
in a nonoverlapped fashion to obtain permuted image. 


After block-wise permutation, we get different permuted images. 
Apply the different permutation methodology for the permuted images in 
the mapping stage process and select different permutation techniques. 
Steps involving in mapping stage are as follows: 


Step 1: 
Step 2: 


Step 3: 


Step 4: 


Step 5: 


Input BII for the mapping process along with one of the permu- 
tation technique. 

Every pixel of intermediate permuted image to be converted 
into its binary 8-bit number. 

Split the binary 8-bit binary into two 4-bit number by grouping 
most significant 4-bit as a higher nibble and least significant 
4-bit as a lower nibble. 

Upper and lower nibble 4-bit number converted into its equiva- 
lent decimal value. 

By using two decimal values obtained from step 4 are used to 
fetch the gray value pixel basic intensity mapping image. Where 
decimal value of a upper nibble is treated as a row indicator and 
decimal value of lower nibble is treated as a column indicator 
for mapping image. 


11.4 PERFORMANCE METRIC FOR PROPOSED PARTIAL IMAGE 
ENCRYPTION SCHEME 


To know the performance of proposed partial encryption system, the 
following performance metrics has been used for evaluation purpose. 
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11.4.1 MEAN SQUARE ERROR (MSE) 


MSE is calculated between original image and encrypted image obtained 
from the proposed system which gives average squared difference between 
original and encrypted image. Mathematically MSE can be defined by 
following eq 11.5.'° 


MSE 


N w 
-—_ Yor.) (11.5) 


i=l j=l 
» 72 
—enc(i, j)] 


where M, N is the total row and total column of image and org(i,/) is 
original input image and enc(i,/) is encrypted image. 


11.4.2. PEAK SIGNAL-TO-NOISE RATIO (PSNR) 


PSNR is inversely proportional to MSE. PSNR refects the encryption 
quality. Mathematically PSNR can be defined as eq 11.6.! 


255 
PNSR = 20*1 ae 
0L10 =] (11.6) 
where MSE is mean square error between input original image and encrypted 
image and can be obtained by using eq 11.5. 


11.4.3, NPCR AND UACI 


Number of pixels change rate (NPCR) is generally calculated between 
original input image and encrypted image. Where NPCR indicates how 
many pixels in the original image change with respect to encrypted image. 
Higher the NPCR greater the security and more the encryption. Mathemati- 
cally, NPCR is defined by following eq 11.8. Unified average changing 
intensity (UACI) which is inversely related to NPCR. UACI gives average 
changing intensity values in the original image.'* Mathematically, UACI 
can be calculated by using eq 11.7. 
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UACI 
ol lorg (i, j)—encii, j)| 
Wen 255 (11.7) 
x100% 


where, M stands for image’s width, N stands for image’s height, and where 
D(ij) is defined as follows (Table 11.2): 


fll) E(i J); 
a ee eel 
Of (iL j)=E(LS), 
where /(i,j) and E(i,j) are the original input image and output cipher image, 
respectively. 


11.4.4 UNIVERSAL IMAGE QUALITY (UIQ) INDEX 


Universal index quality is used for calculating similarity between original 
image and cipher image. Range of UIQ is [—1,1] where value | indicates 
more similarity and value —1 indicates less similarity. UIQ is defined as 
follows:'? 


2 My My, 20.0 
ae = (11.9) 


ye 


On 
UQI (x,y) = ‘ 5 5 5 
0,0, fW,+Hy O,+o 


where sx, Wy, ox, oy, and oxy are the mean of x and y, variance x and y, and 
the covariance of x and y, respectively. 


11.4.5 STRUCTURAL SIMILARITY INDEX MEASURE (SSIM) 


The SSIM is the extended version of the UIQ index. Range of SSIM is 
[-1,1] where value 1 indicates more similarity and value —1 indicates less 
similarity. SSIM is defined as follows:'? 


(24,4, +CI(20,, + C2) 


SSIM (x,y) = 
( ) (uw. + uy +Cl(o*, +07, +C2) 


(11.10) 
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uussi =" ssi ( 9) (11.11) 
j=l 


where C/, C2 are two constants and are used to stabilize the division with 
weak denominator. 


TABLE 11.2 Results Obtained from Proposed Method for Baby Image. 


Gray 2 3 } 


11.5 EXPERIMENTAL RESULTS 


In this experiment, we take different images of size 512 x 512. Results of 
proposed method are tabulated in Tables (from 11.3 to 11.8). From Tables 
11.3 and 11.4, we come to know that amount of encryption in terms of 
MSE and PSNR is less for the combination of GC mapping and permuta- 
tion with lower block size (4 x 4) and more for the combination of CC 
mapping and permutation with lower block size. There is no specific 
control over the amount of encryption in terms of MSE/PSNR but we can 
vary the amount of MSE/PSNR by choosing appropriate block size and 
permutation techniques. From Table 11.5, we come to know that NPCR 
is less for the combination of CC mapping and permutation with lower 
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block size (4 x 4) and almost more than or equal to 99 for GC mapping. 
From Table 11.6, UACI is less for the combination of GC mapping and 
permutation with lower block size and varies for remaining combinations. 
From Tables 11.7 and 11.8 SSIM and UQI more for the combination of GC 
mapping and permutation with lower block size and varies for the other 
combinations (Tables 11.7—11.11). 


TABLE 11.3. MSE for Baby Image for Different Permutation Techniques. 
MSE for Baby Image 


PIE List 1 2 3 4 5 6 7 

GC 31.62 35.21 40.24 47.39 56.72 68.85 82.45 
SC 33.10 32.72 32.38 32.40 32.72 33.18 39.19 
AC 28.43 27.84 26.97 26.16 25.00 22.35 15.67 
CC 26.37 26.53 26.68 26.50 25.38 23.45 17.17 


TABLE 11.4 PSNR for Baby Image for Different Permutation Techniques. 


PSNR for Baby Image 
PIE List 1 2 3 4 5 6 7 
GC 33.13 32.66 32.08 31.37 30.59 29.75 28.96 
SC 32.93 32.98 33.02 33.02 32.98 32.92 32.19 
AC 33.59 32.68 33.82 33.95 34.15 34.63 36.17 
CC 33.91 33.89 33.86 33.89 34.08 34.42 35.78 


TABLE 11.5 NPCR for Baby Image for Different Permutation Techniques. 


NPCR for Baby Image 
PIE List 1 2 3 4 5 6 gh 
GC 54.92 58.57 60.46 62.29 66.76 73.31 81.12 
Nie 99.65 99.63 99.64 99.60 99.54 99.53 99.47 
AC 99.80 99.78 99.79 99.81 99.83 99.84 99.83 
CC 99.71 99.75 99.77 99.78 99.80 99.79 99.79 


TABLE 11.6 UACTI for Baby Image for Different Permutation Techniques. 


UACI for Baby Image 
PIE List 1 2 3 4 5 6 7 
GC 4.90 5.73 6.94 8.54 11.13 14.45 19.58 
Ne 29.54 29.57 29.63 29.72 29.90 30.31 32.01 
AC 63.40 63.42 63.39 63.36 63.22 62.77 61.30 


CC 57.98 57.98 58.06 58.09 57.99 57.75 56.23 
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TABLE 11.7 SSIM for Baby Image for Different Permutation Techniques. 
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SSIM for Baby Image 
PIE List 1 2 3 4 5 6 7 
GC 0.7426 0.6718 0.5356 0.4110 0.2572 0.1394 0.0271 
SC 0.0166 0.0184 0.0180 0.0177 0.0156 0.0159 0.0152 
AC 0.0122 0.0093 0.0124 0.0117 0.0129 0.0100 0.0095 
CC 0.0185 0.0174 0.0138 0.0116 0.0141 0.0115 0.0113 
TABLE 11.8 UQI for Baby Image for Different Permutation Techniques. 

UQI for Baby Image 
PIE List 1 2 3 4 5 6 7 
GC 0.9287 0.8612 0.7488 0.6186 0.5769 0.4109 0.3439 
SC 0.2376 0.2372 0.2391 0.2458 0.2606 0.2760 0.2795 
AC 0.2435 0.2427.) 0.2399 0.2341 = 0.2215: 0.2093 Ss: 0.1881 
CC 0.2479 0.2475 0.2449. -0.2397_~——(0.2307_ ~— 0.2155 (0.2005 


TABLE 11.9 Gray Code (GC) Results Obtained from Proposed Method for Lena and 
Pepper Images. 


Gray code 


TABLE 11.10 NPCR and UACI Comparison Between Proposed GC Code Method and 


Existing Method. 
Images GC code Ref. [11] 
NPCR UACI NPCR UACI 
Lena 99.59 29.01 98.69 18.23 
Pepper 99.62 29.97 97.23 22.21 
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TABLE 11.11 MSE and PSNR Comparison Between Proposed GC Code Method and 
Existing Method. 


Images GC code Ref. [11] 
MSE PSNR MSE PSNR 

Lena 89.29 28.62 9.83 6801 

Pepper 94.26 28.38 9.10 8051 


11.6 CONCLUSION 


Here, proposed is a partial image encryption of medical images, which 
uses various permutation techniques. Proposed system mainly consists of 
permutation and diffusion process. Where the block-wise permutation of 
image is done by using chaotic system and with the help of block size 
table. In diffusion stage, mapping process is used for altering the pixel 
values. From the experiment, result shows that amount of encryption 
varies from different permutation techniques. Based on the requirement 
in the application, a particular permutation-based partial encryption tech- 
nique can be used. The advantages of proposed method is less complexity 
and less computation time. 
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ABSTRACT 


The emergence of artificial Intelligence has paved the way for numerous 
developments in the domain of machine vision. One of the many frame- 
works and algorithms which have set a benchmark for the generation of 
data from learned parameters is Generative Adversarial Networks. In this 
chapter, Generative Adversarial Networks (GANs) and similar algorithms, 
such as Variation Auto-Encoders (VAEs), are used to generate handwritten 
digits from noise. Furthermore, the training data has been visualized to gain 
a proper understanding of the data our model is trying to learn. 


12.1 INTRODUCTION 


In machine learning, generative adversarial network (GAN) is found to be 
stimulating recent innovation. The GANs are utilized to create new data 
that resembles the training data exactly and hence the name generative 
models. GAN accomplish degree of authenticity by matching a generator, 
which figures out how to deliver the objective yield with a discriminator, 
and out how to recognize genuine information from the yield of generator.’ 
The generator tries to mislead the discriminator and the same is protected 
by the discriminator. “Generative” depicts a class of measurable models 
that diverges from discriminative models. 
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Generally: 


¢ New data instances can be generated by generative model. 
¢ Discriminative models segregate between various types of informa- 
tion cases. 


This generative model could produce new photograph of creatures that 
resemble genuine creatures; hence, the working of GAN and generative 
models are similar. All the more officially, group of data occurrences X 
and set of labels Y: 


¢ The likelihood p(X, Y) or p(X) is obtained by the generative model 
of the GAN architecture. 
¢ The discriminative models catch the contingent likelihood p(Y | X). 


Dissimilarity between discriminative and generative models of manu- 
ally written* digits is shown in Figure 12.1. A generative model? includes 
the allocation of the data. For instance, the models for predicting the next 
word in a sequence are similar to the generative models and are more 
simple as compared to the GANs, in light of the fact that they assign a 
probability to a sequence of words." 


¢ Discriminative Model * Generative Model 
a P(y|x) P(x, y) 
H 4 
O _1® ° a O f °) ee Pe y= 0 


FIGURE 12.1 Generative adversarial models for handwritten digits. 


Source: https://developers.google.com/machine-learning/gan/generative?hl=zh-CN. 
https://creativecommons.org/licenses/by/4.0/ 


The discriminative representation attempts to differentiate between the 
0’s and 1’s that are handwritten, by means of a separation line drawn in the 
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data space. On the off chance that it gets the line right, it can recognize 0’s 
from 1’s while never having to demonstrate or identify where the digits 
will be precisely on either side of the line. Conversely, the generative 
representation attempts to deliver persuading 1’s and 0’s by creating digits 
that falls near their genuine partners in the allotted data space. It needs to 
display dispersion all through the data space.'!°?° 


12.2 OUTLINE OF GAN STRUCTURE 
12.2.1. GENERATIVE ADVERSARIAL NETWORK 
AGAN consists of two components for working: 
The generator figures out how to produce conceivable data. The 


produced occasions become negative preparing models for the discrimi- 
nator is appeared in Figure 12.2. 


Random Input 
Vector 


Generator 
Model 


Generated 
Example 


FIGURE 12.2 Example of the GAN generator model. 


Source: https://developers. google.com/machine-learning/gan/generative?hl=zh-CN. 
https://creativecommons.org/licenses/by/4.0/ 


The discriminator learns to tell apart the generator’s duplicate data 
from the original data. If the discriminator identifies the duplicate data, a 
penalty is imposed on the generator for producing results that can be easily 
identified as shown in Figure 12.3. 
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Input Example 


Discriminator 
Model 


Binary Classification 
Real/Fake 


FIGURE 12.3 Example of the GAN discriminator model. 
Source: https://developers.google.com/machine-learning/gan/generative?hl=zh-CN. 
https://creativecommons.org/licenses/by/4.0/ 


When preparing starts, the generator delivers clearly counterfeit 
information and the discriminator rapidly figures out how to tell that it 
is fake.° As preparing advances, the generator draws nearer to creating 
yield that can trick the discriminator. At last, if generator preparing 
works out in a good way, the discriminator deteriorates at differentiating 
among genuine and counterfeit. It begins to group counterfeit informa- 
tion as genuine, and its exactness diminishes. Both the generator and 
the discriminator are neural systems. The input to the discriminator is 
directly obtained from output of the generator. Through back propaga- 
tion, the discriminator’s grouping gives a sign that the generator uses to 
refresh its loads. Generally speaking design of generative adversarial 
system is shown in Figure 12.4. 


12.3. TRAINING DATA 
12.3.1 DISCRIMINATOR TRAINING 


The discriminator’s preparation information originates from two sources: 
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Training set 41 ( 
AM ea Discriminator > 
Random i Fi i 5 a pReal 


Fake 
| Ves image 


FIGURE 12.4 Architecture diagram of GAN. 
Source: https://developers.google.com/machine-learning/gan/generative?hl=zh-CN 


Generator 


¢ Real information occurrences, for example, actual pictures of indi- 
viduals. Discriminator utilizes the occurrences as optimistic models 
at the time of preparation. 

¢ Fake information occurrences made by the generator. The discrimi- 
nator utilizes the cases as negative models at the time of preparation. 


Discriminator preparation steps: 


1. Discriminator orders both genuine information and duplicate data 
from the generator. 

2. The loss in the discriminator is the measure for imposing penalty 
for not classifying the original data as original and duplicate data as 
duplicate. 

3. Discriminator refreshes the loads all the way through back prolifera- 
tion as of the discriminator loss through the discriminator arrange. 


12.3.2. THE GENERATOR TRAINING 


The generator part of a GAN figures out how to make false data by joining 
input from the discriminator. It figures out how to cause the discriminator 
to group its output as real. Generator preparing requires more tight combi- 
nation between the generator and the discriminator than discriminator 
preparing requires. The segment of the GAN that prepares the generator 
incorporates: 
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¢ Random input 

¢ Generator arrange, which changes the irregular contribution to an 
information occasion. 

¢ The neural network of the discriminator that is utilized for identi- 
fying the obtained data. 

¢ Discriminator output 

¢ Generator misfortune, which penalizes the generator for neglecting 
to trick the discriminator. 


12.3.3 CONVERGENCE 


As the generator improves with preparing, the discriminator execution 
deteriorates on the grounds that the discriminator will find it difficult to 
identify the difference between original and fake instance. In the event 
that the generator succeeds flawlessly, at that point, the discriminator 
has half exactness. In actuality, the discriminator flips a coin to make its 
expectation. 

This movement represents an issue for combination of the GAN as a 
whole: the discriminator criticism gets less important after some time.'” 
Even after the discriminator starts giving feedback that does not depend 
upon the input data, when the training of the GAN network happens 
continuously it ends up with generator working on the unusable feedback. 
This causes the quality of the generator going down and then it may 
collapse completely. The convergence in the GAN is often changing 
frequently instead of being in a stable state." 


12.3.4 LOSS FUNCTIONS 


A GAN can have two misfortune capacities: first for generator preparing 
and furthermore the second for discriminator preparing. In both of those 
plans, in any case, the generator can just influence one term inside the 
separation measure: the term that mirrors the circulation of the copy infor- 
mation. So during generator preparing, we drop the contrary term, which 
mirrors the distribution of the significant data. 

Minimax Loss° is shown in eq 11.1. The generator tries to attenuate the 
subsequent function while the discriminator tries to maximize it: 
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E, [log(D(x))]+ E,[log(1— D(G(z)))] (12.1) 
In the above function, 


¢ D(x) is that the discriminator’s approximation of the probability 
that original data instance x is real. 

¢ Ex is the arithmetic mean over all original data instances. 

¢ G(z) is that the generator’s output when given noise z. 

¢ D(G(z)) is the probability that a duplicate data is estimated as a 
original data. 

¢ Ez is the generator’s arithmetic mean of all the inputs (the arith- 
metic mean over all generated duplicate instances G(z)). 


One of the very common issues with GANs is their high instability 
during training. This is because the two CNNs ideally do not take equal 
time for getting trained individually. This means that if the model starts to 
train in the wrong direction, then the latent variables do not train on the 
right path in the future. This leads to wrong generation and discrimination 
of digits. To combat this, the concept of Variation Auto-Encoders (VAEs) 
were introduced is shown in Figure 12.5. 


Dense - 500 


Dense - 120 


oO 
Dense - 30 


Sample - 30 
Dense - 120 
Dense - 500 | 


"J output 


FIGURE 12.5 Architecture of a Variational Auto-Encoder. 
Source: https://developers.google.com/machine-learning/gan/generative?hl=zh-CN 
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12.3.5 WORKING OF GANS 


¢« The generator, that is, reversed Convolution Neural Network, 
accepts noise and returns a 28 x 28 image. 

¢ The discriminator is employed with 28 x 28 images, that is, a 
Convolution Neural Network alongside a batch of images fetched 
from the real dataset, that is, the data available to us. 

¢ The role of the discriminator is to return 0, if the generated image 
doesn’t pass the realism check, and 1 if it does. 


12.4 VARIATIONAL AUTO-ENCODERS 


Similar to that of GANs, VAEs? are used for the representation of latent 
variables. The issue with auto encoders is that their latent space might not 
be continuous. Furthermore, there might be problems with interpolation as 
well. On the other hand, VAEs have latent spaces which are continuous, 
allowing for easy random sampling and interpolation.'® 

Due to the issue of unstability which GANs face, the model at its 
ideal training path, takes much lengthy time for the perfect generation 
of handwritten digits samples that fool the discriminator. To combat 
this, a ConvNet was pretrained on the MNIST dataset’ and is utilized as 
a reinstatement for the previously prevailing discriminator for reducing 
unstability among latent variables. 

One the other hand VAEs showed exception training capacity and 
stability. Even though it took long training periods, it reduced the chances 
of unstability by following the right training path and ending up with 
near perfect generated results. A similar concept could be applied to 3D 
objects stored in the format of Point Clouds. A 3D cloud can be created by 
compressing the data into a voxel-based compression and then fed to 3D 
convolutional layers instead of 2D layers. 


12.5 CASE STUDY 
12.5.1 DATASET 


The MNIST database is a collection of 28 x 28 grayscale images of hand- 
written digits having a count of 60,000 training and 10,000 annotated test 
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samples. This dataset is the smaller portion in the larger dataset provided 
by NIST as given by Yann LeCun, Corinna Cortes, and Christopher J.C. 
Burges.‘ 


12.5.2 METHODOLOGY 


The MNIST handwritten digits dataset contains 60,000 training and 10,000 
testing images of dimension 28 x 28. There are 10 classes of data each 
comprising of 7000 images of the respective handwritten digits (0-9). The 
dataset was loaded and trained on our constructed model for over 80,000 
iterations. For every iteration, the outputs were recorded and plotted to 
check for progress. 

Trained model on the dataset for over 80,000 iterations and posted 
the results as shown in Figure 12.6. On applying t-SNE° on our dataset, 
as demonstrated in Figure 12.6, that at around 2000 epochs we reach a 
good enough clustering to understand that a generative and adversarial 
model would be able to pick up on the high level and low-level features as 
possessed by the data. 


FIGURE 12.6 Tensor board visualization of the dataset as 10 clusters using t-SNE. 


248 Computer Vision and Recognition Systems 


FIGURE 12.7 Training and generation results. 


“ & we © 
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After a certain point in training time, the outliers and noise in the data 
gets reduced and the network starts generating near perfect visualizations 
of the handwritten digits in the dataset. Figure 12.6 shows how well the 
data were analyzed by the network and images generated. 


ag Gradient of the generator with the — log D cost 


After 1 epoch 
After 10 epochs 
After 25 epochs 


[VeL(D. ge) 


Training iterations 


FIGURE 12.8 Loss function plot with respect to number of epochs. 


Due to the issue of unstability which GANs face, the model at its ideal 
training path,'* took a longer time for perfecting the model and gener- 
ating handwritten digit samples that fool the discriminator. To combat 
this, a ConvNet was pretrained on the MNIST dataset and then used as a 


Image Synthesis with Generative Adversarial 249 


replacement to the already existing discriminator for reducing unstability 
among latent variables. One the other hand, VAEs showed exception 
training capacity and stability. Even though it took long training periods, 
it reduced the chances of unstability by following the right training path 
and ending up with near perfect generated results. Even though computa- 
tionally intensive and sometimes unstable, generative networks hold the 
potential to solving many challenges in artificial intelligence.'° 


12.6 CONCLUSION AND FUTURE WORK 


Asimilar concept could be applied to 3D objects stored in the format of Point 
Clouds. Compression is performed on the data to provide a 3D cloud with 
a voxel-based compression and then fed to 3D convolutional layers instead 
of 2D layers. Such a concept is applicable to many areas of research such 
as machine vision, speech, making biological and chemical discoveries. 
Even though computationally intensive and sometimes unstable, genera- 
tive networks hold the potential to solving many challenges in artificial 
intelligence. 
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